Open Access   Article Go Back

Optimizing Document Clustering for Dimension Reduction using improved k-means

J. Verma1 , N. Verma2

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-6 , Page no. 233-238, Jun-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i6.233238

Online published on Jun 30, 2018

Copyright © J. Verma, N. Verma . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: J. Verma, N. Verma, “Optimizing Document Clustering for Dimension Reduction using improved k-means,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.6, pp.233-238, 2018.

MLA Style Citation: J. Verma, N. Verma "Optimizing Document Clustering for Dimension Reduction using improved k-means." International Journal of Computer Sciences and Engineering 6.6 (2018): 233-238.

APA Style Citation: J. Verma, N. Verma, (2018). Optimizing Document Clustering for Dimension Reduction using improved k-means. International Journal of Computer Sciences and Engineering, 6(6), 233-238.

BibTex Style Citation:
@article{Verma_2018,
author = {J. Verma, N. Verma},
title = {Optimizing Document Clustering for Dimension Reduction using improved k-means},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2018},
volume = {6},
Issue = {6},
month = {6},
year = {2018},
issn = {2347-2693},
pages = {233-238},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=2169},
doi = {https://doi.org/10.26438/ijcse/v6i6.233238}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i6.233238}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=2169
TI - Optimizing Document Clustering for Dimension Reduction using improved k-means
T2 - International Journal of Computer Sciences and Engineering
AU - J. Verma, N. Verma
PY - 2018
DA - 2018/06/30
PB - IJCSE, Indore, INDIA
SP - 233-238
IS - 6
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
542 334 downloads 404 downloads
  
  
           

Abstract

Clustering is the process for grouping of similar document into a single cluster and dissimilar documents in other clusters. Document clustering is the process of grouping similar text documents in a single cluster. K-means clustering algorithm is a center predictable approach which selects initial centers randomly. In this paper, improved k-means clustering algorithm is used for text documents which predicts centers manually. Standard k-means uses cosine similarity but improved k-means uses Euclidean similarity measures for grouping similar documents in a single cluster. According to experimental results, accuracy of improved k-means is high as compared to existing k-means algorithm. Performance of proposed algorithm is measured in terms of F-measure, Precision, time and recall.

Key-Words / Index Term

Clustering, Document clustering, Tf-Idf, K-means, Euclidean similarity

References

[1] Svadas, T., & Jha, J. (2015). Document Cluster Mining on Text Documents.
[2] Thomas, A. M., & Resmipriya, M. G. (2016). An efficient text classification scheme using clustering. Procedia Technology, 24, 1220-1225.
[3] Punitha, S. C., Jayasree, R., & Punithavalli, M. (2013, January). Partition document clustering using ontology approach. In Computer Communication and Informatics (ICCCI), 2013 International Conference on (pp. 1-5). IEEE.
[4] Rai, P., & Singh, S. (2010). A survey of clustering techniques. International Journal of Computer Applications, 7(12), 1-5.
[5] Murugesan, K., & Zhang, J. (2011, July). Hybrid bisect K-means clustering algorithm. In Business Computing and Global Informatization (BCGIN), 2011 International Conference on (pp. 216-219). IEEE.
[6] Agrawal, R., & Phatak, M. (2012). Document clustering algorithm using modified k-means.
[7] Rafi, M., Maujood, M., Fazal, M. M., & Ali, S. M. (2010, June). A comparison of two suffix tree-based document clustering algorithms. In Information and Emerging Technologies (ICIET), 2010 International Conference on (pp. 1-5). IEEE.
[8] Mishra, R. K., Saini, K., & Bagri, S. (2015, May). Text document clustering on the basis of inter passage approach by using k-means. In Computing, Communication & Automation (ICCCA), 2015 International Conference on (pp. 110-113). IEEE.
[9] Zhang, Z., Cheng, H., Zhang, S., Chen, W., & Fang, Q. (2008, June). Clustering aggregation based on genetic algorithm for documents clustering. In Evolutionary Computation, 2008. CEC 2008.(IEEE World Congress on Computational Intelligence). IEEE Congress on (pp. 3156-3161). IEEE.
[10] Wang, J., & Su, X. (2011, May). An improved K-Means clustering algorithm. In Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on (pp. 44-46). IEEE.
[11] Singh, V. K., Tiwari, N., & Garg, S. (2011, October). Document clustering using k-means, heuristic k-means and fuzzy c-means. In Computational Intelligence and Communication Networks (CICN), 2011 International Conference on (pp. 297-301). IEEE.
[12] Sahu, L., & Mohan, B. R. (2014, December). An improved K-means algorithm using modified cosine distance measure for document clustering using Mahout with Hadoop. In Industrial and Information Systems (ICIIS), 2014 9th International Conference on (pp. 1-5). IEEE.