Balanced Data Clustering Algorithm for Both Hard and Soft Clustering

Purnendu Das, Bishwa Ranjan Roy, Saptarshi Paul

Open Access Article Go Back

Balanced Data Clustering Algorithm for Both Hard and Soft Clustering

Purnendu Das¹ , Bishwa Ranjan Roy² , Saptarshi Paul³

Dept. of Computer Science, Assam University, Silchar, India.
Dept. of Computer Science, Assam University, Silchar, India.
Dept. of Computer Science, Assam University, Silchar, India.

Correspondence should be addressed to: brroy88@gmail.com.

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-2 , Page no. 176-183, Feb-2018

CrossRef-DOI: https://doi.org/10.26438/ijcse/v6i2.176183

Online published on Feb 28, 2018

Copyright © Purnendu Das, Bishwa Ranjan Roy, Saptarshi Paul . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Citation

IEEE Style Citation: Purnendu Das, Bishwa Ranjan Roy, Saptarshi Paul, “Balanced Data Clustering Algorithm for Both Hard and Soft Clustering,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.2, pp.176-183, 2018.

MLA Citation

MLA Style Citation: Purnendu Das, Bishwa Ranjan Roy, Saptarshi Paul "Balanced Data Clustering Algorithm for Both Hard and Soft Clustering." International Journal of Computer Sciences and Engineering 6.2 (2018): 176-183.

APA Citation

APA Style Citation: Purnendu Das, Bishwa Ranjan Roy, Saptarshi Paul, (2018). Balanced Data Clustering Algorithm for Both Hard and Soft Clustering. International Journal of Computer Sciences and Engineering, 6(2), 176-183.

BibTex Citation

BibTex Style Citation:
@article{Das_2018,
author = {Purnendu Das, Bishwa Ranjan Roy, Saptarshi Paul},
title = {Balanced Data Clustering Algorithm for Both Hard and Soft Clustering},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {2 2018},
volume = {6},
Issue = {2},
month = {2},
year = {2018},
issn = {2347-2693},
pages = {176-183},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1719},
doi = {https://doi.org/10.26438/ijcse/v6i2.176183}
publisher = {IJCSE, Indore, INDIA},
}

RIS Citation

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i2.176183}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1719
TI - Balanced Data Clustering Algorithm for Both Hard and Soft Clustering
T2 - International Journal of Computer Sciences and Engineering
AU - Purnendu Das, Bishwa Ranjan Roy, Saptarshi Paul
PY - 2018
DA - 2018/02/28
PB - IJCSE, Indore, INDIA
SP - 176-183
IS - 2
VL - 6
SN - 2347-2693
ER -

VIEWS	PDF	XML
859	718 downloads	334 downloads

Bar Line

Abstract

Clustering is a widely studied problem in a variety of application domains such as neural network and statistics. It is the process of partitioning or grouping a set of patterns into disjoint clusters which show that patterns belonging to the same cluster are same or alike and patterns in different cluster are different. There are many ways to deal with the above problem of clustering. K-means is the simple and effective algorithm in producing good clustering results for many practical applications. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the fast global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Fuzzy C- means is also very popular for fuzzy based data clustering. But all such clustering algorithms are hugely effected by the imbalanced nature of data values. Each data in the dataset has multiple attributes and the value of some attributes may be so large that the importance of other attributes values may be completely ignored during the clustering process. In this paper we proposed a data balancing technique for both fast global k-means and fuzzy c-means algorithm. We balanced the attributes values of each data in such a way that all the attributes get importance during the clustering process.

Key-Words / Index Term

k-Means, Global k-Means, Fast Global k-Means, Data Streaming

References

[1] L. Bai, J. Liang, C. Sui, and C. Dang, “Fast global k-means clustering based on local geometrical information,” Informa- tion Sciences, vol. 245, no. 0, pp. 168 – 180, 2013.
[2] A. Jain and R. Dubes, Eds., Algorithms for Clustering Data. Prentice Hall, 1988.
[3] R. Wan, X. Yan, and X. Su, “A weighted fuzzy clustering algo rithm for data stream,” in Proceedings of the 2008 ISECS Inter- national Colloquium on Computing, Communication, Control, and Management - Volume 01, ser. CCCM ’08, 2008, pp. 360– 364.
[4] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” in Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ser. PODS ’02, 2002, pp. 1–16.
[5] A. Likas, M. Vlassis, and J. Verbeek, “The global k-means clustering algorithm,” Pattern Recognition, vol. 35, no. 2, pp. 451–461, 2003.
[6] A. Bagirov, “Modified global k-means algorithm for sum-of- squares clustering problem,” Pattern Recognition, vol. 41, pp. 3192–3199, 2008.
[7] H. Wang, J. Qi, W. Zheng, and M. Wang, “Balance k-means algorithm,” in Computational Intelligence and Software Engi- neering, 2009. CiSE 2009. International Conference on, Dec 2009, pp. 1–3.
[8] R. He, W. Xu, J. Sun, and B. Zu, “Balanced k-means algorithm for partitioning areas in large-scale vehicle routing problem,” in Proceedings of the 2009 Third International Symposium on Intelligent Information Technology Application - Volume 03, ser. IITA ’09. IEEE Computer Society, 2009, pp. 87–90. [Online]. Available: http://dx.doi.org/10.1109/IITA.2009.307

Citations	8797
h-index	34
i10-index	152

Impact Factor :	3.802
ISSN :	2347-2693 (Online)