Open Access   Article Go Back

A Fast Global k-means Algorithm for Datasets having Streaming Behavior

Purnendu Das1 , Bishwa Ranjan Roy2 , Sanju Das3

  1. Dept. of Computer Science, Assam University, Silchar, India.
  2. Dept. of Computer Science, Assam University, Silchar, India.
  3. Dept. of Computer Science, Assam University, Silchar, India.

Correspondence should be addressed to: brroy88@gmail.com.

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-2 , Page no. 84-91, Feb-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i2.8491

Online published on Feb 28, 2018

Copyright © Purnendu Das, Bishwa Ranjan Roy, Sanju Das . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Purnendu Das, Bishwa Ranjan Roy, Sanju Das, “A Fast Global k-means Algorithm for Datasets having Streaming Behavior,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.2, pp.84-91, 2018.

MLA Style Citation: Purnendu Das, Bishwa Ranjan Roy, Sanju Das "A Fast Global k-means Algorithm for Datasets having Streaming Behavior." International Journal of Computer Sciences and Engineering 6.2 (2018): 84-91.

APA Style Citation: Purnendu Das, Bishwa Ranjan Roy, Sanju Das, (2018). A Fast Global k-means Algorithm for Datasets having Streaming Behavior. International Journal of Computer Sciences and Engineering, 6(2), 84-91.

BibTex Style Citation:
@article{Das_2018,
author = {Purnendu Das, Bishwa Ranjan Roy, Sanju Das},
title = {A Fast Global k-means Algorithm for Datasets having Streaming Behavior},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {2 2018},
volume = {6},
Issue = {2},
month = {2},
year = {2018},
issn = {2347-2693},
pages = {84-91},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1705},
doi = {https://doi.org/10.26438/ijcse/v6i2.8491}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i2.8491}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1705
TI - A Fast Global k-means Algorithm for Datasets having Streaming Behavior
T2 - International Journal of Computer Sciences and Engineering
AU - Purnendu Das, Bishwa Ranjan Roy, Sanju Das
PY - 2018
DA - 2018/02/28
PB - IJCSE, Indore, INDIA
SP - 84-91
IS - 2
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
1079 461 downloads 366 downloads
  
  
           

Abstract

The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the fast global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. Also the continuously arriving data stream has become common phenomenon for many fields recent years; for example, sensor networks, web click stream and internet traffic flow. Researchers proposes many innovative technologies to manage such streaming datasets. Finding efficient data stream mining algorithm has become an important research subject. In this paper we propose a fast global k-means algorithm for datasets having streaming behavior. Experiment shows that our proposed algorithm is more efficient than the fast global k-means algorithm in case of streaming datasets

Key-Words / Index Term

k-Means, Global k-Means, Fast Global k-Means, Data Streaming

References

[1] Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” in Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ser. PODS ’02, 2002, pp. 1–16.
[2] L. Bai, J. Liang, C. Sui, and C. Dang, “Fast global k-means clustering based on local geometrical information,” Information Sciences, vol. 245, no. 0, pp. 168 – 180, 2013.
[3] I. Fodor. (2002) A Survey of Dimen sion Reduction Techniques. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.5098
[4] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” SCIENCE, vol. 290, pp. 2323–2326, 2000.
[5] J. B. Tenenbaum, V. d. Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
[6] H.-P. Kriegel, P. Kro¨ger, and A. Zimek, “Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering,” ACM Trans. Knowl. Discov. Data, vol. 3, no. 1, pp. 1:1–1:58, Mar. 2009.
[7] A. Jain and R. Dubes, Eds., Algorithms for Clustering Data. Prentice Hall, 1988.
[8] A. Likas, M. Vlassis, and J. Verbeek, “The global k-means clustering algorithm,” Pattern Recognition, vol. 35, no. 2, pp. 451–461, 2003.
[9] A. Bagirov, “Modified global k-means algorithm for sum-of-squares clustering problem,” Pattern Recognition, vol. 41, pp. 3192–3199, 2008.
[10] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “Streaming-data algorithms for high-quality clustering,” in Data Engi- neering, 2002. Proceedings. 18th International Conference on, 2002, pp. 685–694.
[11] P. Domingos and G. Hulten, “A general method for scaling up machine learning algorithms and its application to clustering,” in Proceedings of the Eighteenth International Conference on Machine Learning, ser. ICML ’01, 2001, pp. 106–113.
[12] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A framework for clus tering evolving data streams,” in Proceedings of the 29th international conference on Very large data bases - Volume 29, ser. VLDB ’03, 2003, pp. 81-92
[13] R. Wan, X. Yan, and X. Su, “A weighted fuzzy clustering algorithm for data stream,” in Proceedings of the 2008 ISECS International Colloquium on Computing, Communication, Control, and Management- Volume 01, ser. CCCM ’08, 2008, pp. 360–364.
[14] L. Bai, J. Liang, C. Sui, and C. Dang, “Fast global k-means clustering based on local geometrical information,” Information Sciences, vol. 245, no. 0, pp. 168 – 180, 2013.
[15] J. Xie and S. Jiang, “A simple and fast algorithm for global k-means clustering,” in Education Technology and Computer Science (ETCS), 2010 Second International Workshop on, vol. 2, March 2010, pp. 36– 40.
[16] L.-E. Sal, J. Carrasco-Ochoa, and M.-T. Fco, “Fast global k-means with similarity functions algorithm,” in Intelligent Data Engineering and Automated Learning IDEAL 2006, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2006, vol. 4224, pp. 512–521.