Open Access   Article Go Back

Text Clustering Techniques : A Review

Mukesh Kumar1 , Amandeep Verma2

Section:Review Paper, Product Type: Journal Paper
Volume-6 , Issue-6 , Page no. 1091-1099, Jun-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i6.10911099

Online published on Jun 30, 2018

Copyright © Mukesh Kumar, Amandeep Verma . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Mukesh Kumar, Amandeep Verma, “Text Clustering Techniques : A Review,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.6, pp.1091-1099, 2018.

MLA Style Citation: Mukesh Kumar, Amandeep Verma "Text Clustering Techniques : A Review." International Journal of Computer Sciences and Engineering 6.6 (2018): 1091-1099.

APA Style Citation: Mukesh Kumar, Amandeep Verma, (2018). Text Clustering Techniques : A Review. International Journal of Computer Sciences and Engineering, 6(6), 1091-1099.

BibTex Style Citation:
@article{Kumar_2018,
author = {Mukesh Kumar, Amandeep Verma},
title = {Text Clustering Techniques : A Review},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2018},
volume = {6},
Issue = {6},
month = {6},
year = {2018},
issn = {2347-2693},
pages = {1091-1099},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=2305},
doi = {https://doi.org/10.26438/ijcse/v6i6.10911099}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i6.10911099}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=2305
TI - Text Clustering Techniques : A Review
T2 - International Journal of Computer Sciences and Engineering
AU - Mukesh Kumar, Amandeep Verma
PY - 2018
DA - 2018/06/30
PB - IJCSE, Indore, INDIA
SP - 1091-1099
IS - 6
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
844 278 downloads 125 downloads
  
  
           

Abstract

Text clustering is an unsupervised data mining technique which involves the process of classifying an unlabeled dataset into the groups of similar data objects. These groups are known as clusters; each cluster consists of data objects such that the data objects are more similar within the same group and dissimilar to the data objects of other groups. There is a variety of text clustering techniques used to compute the similarity among the given unlabeled dataset patterns. Moreover, huge literature is available on clustering algorithms and a comprehensive survey would also be an immense task. The purpose of this paper is an attempt to explore the text clustering techniques and to facilitate the researchers for the future inventions. In this paper, literature survey of different text clustering techniques has been performed and presented an analysis of various studies in this area. After reviewing various text clustering techniques from different aspects, this paper suggests research directions for the researchers in this field that can be proved useful for the researchers. Survey of text clustering techniques is performed for the English text/documents as well as for the documents in vernaculars like Gurumukhi script.

Key-Words / Index Term

Text clustering, Clustering techniques, Data mining techniques, Unsupervised learning, Machine learning

References

[1]. S.Prabha, K.Duraiswamy, M.Sharmila, “Analysis of Different Clustering Techniques in Data and Text Mining” International Journal of Computer Science Engineering (IJCSE), Vol. 3, PP. 107-116, 2014.
[2]. Han, J., Kamber, M., & Tung, A. K., “Spatial Clustering Methods in Data Mining: A Survey”, Geographic Data Mining and Knowledge Discovery, Taylor and Francis, PP. 1–29, 2001.
[3]. Ester, M., Kriegel, H., Sander, J., & Xu, X., “A Density-Based Algorithm for Discovering Clusters in Large Spatial Database with Noise”, Second International conference on Knowledge Discovery and Data Mining, Portland, PP. 226-231, 1996.
[4]. Gholamhosein Sheikholeslami, Surojit Chatterjee, Aidong Zhang, “Wave Cluster: a wavelet-based clustering approach for spatial data in very large databases”, the VLDB Journal, Vol. 8, PP. 289- 304, 2000.
[5]. Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications”, Data Mining and Knowledge Discovery, PP. 5-33, 2005,.
[6]. Michael Steinbach, George Karypis Vipin Kumar, “A Comparison of Document Clustering Techniques”, Proc. Knowledge Discovery and Data Mining (KDD) Workshop Text Mining, PP.1-20, 2000.
[7]. Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn, "Data clustering: a review" ACM computing surveys (CSUR,) No. 3, PP. 264-323, 1999.
[8]. Anjana Gosain, Sonika Dahiya, “Performance Analysis of Various Fuzzy Clustering Algorithms: A Review”, Prodedia Computer Science 79, Elsevier Science Ltd.,Vol. 79, PP. 100-111, 2016.
[9]. Anjana Gosain, Sonika Dahiya, “Performance Analysis of Various Fuzzy Clustering Algorithms: A Review”, Prodedia Computer Science 79, Elsevier Science Ltd., PP. 100-111, 2016.
[10]. R. Krishnapuram, J.M. Keller, “A possibilistic approach to clustering”, IEEE transactions on fuzzy systems, vol. 1, issue: 2, 1993.
[11]. R. Krishnapuram, “Generation of membership functions via possibilistic clustering” published in Fuzzy Systems, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the Third IEEE Conference on, 1994.
[12]. Nikhil R. Pal, Kuhu Pal, James M. Keller, and James C. Bezdek, ”A Possibilistic Fuzzy c-Means Clustering Algorithm”, IEEE Transactions on Fuzzy Systems, Vol. 13, No. 4, 2005.
[13]. Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn, "Data clustering: a review" ACM computing surveys (CSUR,) No. 3, PP. 264-323, 1999.

[14]. MacQueen, J., “Some methods for classification and analysis of multivariate observations”, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probabilities, 1, 281-296, 1967.
[15]. Kaufman L, Rousseeuw PJ, “Clustering by means of medoids”. In: Dodge Y, editor. Statistical data analysis based on the L1 norm and related methods. Amsterdam: North Holland/Elsevier. pp. 405–416, 1987.
[16]. Bezdek, J. C., Ehrlich, R., & Full, W., “FCM: The fuzzy C-means clustering algorithm”. Computers & Geosciences, 10 (2–3), 191–203, 1984.
[17]. Krishna, K., & Murty, M. N., “Genetic K-means algorithm”. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 29 (3), 433–439, 1999,
[18]. Aristidis Likasa, Nikos Vlassisb and JakobJ. Verbeekb, “The global k-means clustering algorithm”, Pattern Recognition Society, Elsevier Science Ltd. 36, 451 – 461, 2002.
[19]. Arthur, D., & Vassilvitskii, S., “K-means ++ : The advantages of careful seeding”. In Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms pp. 1027–1035, 2007.
[20]. Kwedlo, W., “A clustering method combining differential evolution with the K-means algorithm”. Pattern Recognition Letters, Elsevier Science Ltd. 32 (12), 1613–1621, 2011.
[21]. Malinen, M. , Mariescu-Istodor, R. , & Fränti, “K-means*: Clustering by gradual data transformation”. Pattern Recognition, 47 (10), 3376–3386, 2014.
[22]. M. Ester, H.-P. Kriegel, J. Sander, X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, in: Proceedings of Second International Conference on Knowledge Discovery and Data Mining, Portland, OR 1996, pp. 226–231, 1996.
[23]. Derya Birant, Alp Kut, “ST-DBSCAN: An algorithm for clustering spatial–temporal data” Data & Knowledge Engineering, Elsevier Science Ltd. 60, 208–221, 2007.
[24]. Heiko Timm, Christian Borgelt, and Rudolf Kruse, “An Extension of Possibilistic Fuzzy Cluster Analysis” Fuzzy Sets and Systems, Elsevier Science Ltd. Volume 147, Issue 1, 1 October 2004, Pages 3-16, 2004.
[25]. Brendan J. Frey and Delbert Dueck, “Clustering by Passing Messages Between Data Points” Elsevier Science Ltd., vol. 315, pp. 972-976, 2007.
[26]. Chen, M., Li, L., Wang, B., Cheng, J., Pan, L., & Chen, X., “Effectively clustering by finding density backbone based-on kNN”. Pattern Recognition, Elsevier Science Ltd. 60, 486–498, 2016.
[27]. W.L. Cai, S.C. Chen, D.Q. Zhang, “Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation”, Pattern Recognition, Elsevier Science Ltd. 40 (3) 825–838, 2007
[28]. S. Krinidis, V. Chatzis, “A Robust fuzzy local Information C-means clustering Algorithm”, IEEE Trans. Image Process. 19 (5) 1328–1337, 2010.
[29]. Li, C., Zhou, J., Kou, P., Xiao, J., “A novel chaotic particle swarm optimization based fuzzy clustering algorithm”. Neurocomputing, Elsevier Science Ltd. 83, 98–109, 2012.
[30]. Du-Ming Tsai, Chung-Chan Lin, “Fuzzy C-means based clustering for linearly and non linearly separable data”. Pattern Recognition, Elsevier Science Ltd. 44, 1750–1760, 2011.
[31]. Witold Pedrycz, “Conditional Fuzzy C-Means” Pattern Recognition Letters, Elsevier Science Ltd., Vol. 17 PP. 625-631, 1996.
[32]. Prabhjot Kaur and Anjana Gosain, “Density-Oriented Approach to Identify Outliers and Get Noiseless Clusters in Fuzzy C – Means”, Fuzzy Systems (FUZZ), 2010 IEEE International Conference on, 2010.
[33]. Rhee, Frank Chung Hoon, and Cheul Hwang. "A type-2 fuzzy C-means clustering algorithm." In IFSA World Congress and 20th NAFIPS International Conference, 2001. IEEE, Joint 9th, vol. 4, pp. 1926-1929, 2001.
[34]. Prabhjot Kaur, Dr. I. M. S. Lamba, Dr. Anjana Gosain, “Kernelized Type-2 Fuzzy C-means Clustering Algorithm in Segmentation of Noisy Medical Images” Recent Advances in Intelligent Computational Systems (RAICS), IEEE, 2011.
[35]. Rafik A. Aliev, Witold Pedrycz, Babek G. Guirimov, Rashad R. Aliev, Umit Ilhan, Mustafa Babagil, Sadik Mammadli, “Type-2 fuzzy neural networks with fuzzy clustering and differential evolution optimization”, Information Sciences, Elsevier Science Ltd. 181 1591–1608, 2011.
[36]. Ondrej Linda, Milos Manic, “General Type-2 Fuzzy C-Means Algorithm for Uncertain Fuzzy Clustering”, IEEE Transactions on Fuzzy Systems, Vol. 20, No. 5, 2012.
[37]. M.H. Fazel Zarandi, R. Gamasaee, I.B. Turksen, “A type-2 fuzzy c-regression clustering algorithm for Takagi–Sugeno system identification and its application in the steel industry”, Information Sciences, Elsevier Science Ltd. 187,179–203, 2012.
[38]. Indices Samira Malek Mohamadi Golsefid, Mohammad Hossein Fazel Zarandi, “Dual-centers type-2 fuzzy clustering framework and its verification and validation indices” Applied Soft Computing, Elsevier Science Ltd. 1568-4946, 2015.
[39]. S. Malek Mohamadi Golsefid, M.H. Fazel Zarandi, I.B. Turksen, “Multi-central general type-2 fuzzy clustering approach for pattern recognitions”, Information Sciences, Elsevier Science Ltd. Vol. 328, PP 172–188, 2016.
[40]. Jnanendra Prasad Sarkar, Indrajit Saha, Ujjwal Maulik, “Rough Possibilistic Type-2 Fuzzy C-Means clustering for MR brain image segmentation” Applied Soft Computing, Elsevier Science Ltd. Vol. 46, PP 527–536, 2016.
[41]. Zeshui Xu and Junjie Wu “Intuitionistic fuzzy C-means clustering algorithms”, Journal of Systems Engineering and Electronics, IEEE, Vol. 21, Issue: 4, 2010.
[42]. Prabhjot Kaur, Dr. A. K. Soni, Dr. Anjana Gosain, “Robust Intuitionistic Fuzzy C-Means Clustering for linearly and nonlinearly Separable Data”, International Conference on Image Information Processing (ICIIP 2011), IEEE, 2011.
[43]. Tamalika Chaira, “A novel intuitionistic fuzzy C-means clustering algorithm and its application to medical images”, Applied Soft Computing, Elsevier Science Ltd., Vol. 11, PP. 1711–1717, 2011.
[44]. Dawei Xu, Zeshui Xu, Shousheng Liu, Hua Zhao, “A spectral clustering algorithm based on intuitionistic fuzzy information”, Knowledge-Based Systems, Elsevier Science Ltd., Vol. 53, PP. 20–26, 2013.
[45]. Kuo-Ping Lin, Member, IEEE, “A Novel Evolutionary Kernel Intuitionistic Fuzzy C-means Clustering Algorithm”, IEEE Transactions on Fuzzy Systems, Vol. 22, No. 5, 2014.
[46]. Zhong Wang, Zeshui Xu, Shousheng Liu, Zeqing Yao, “Direct clustering analysis based on intuitionistic fuzzy implication”, Applied Soft Computing, Elsevier Science Ltd., Vol. 23, , PP 1–8, 2014.
[47]. Hanuman Verma, R. K. Agrawal and Aditi Sharan, “An Improved Intuitionistic Fuzzy C-means Clustering Algorithm Incorporating Local Information for Brain Image Segmentation”, Applied Soft Computing, Elsevier Science Ltd., Vol. 46, PP. 543–557, 2016.
[48]. Kewen Chen, Zuping Zhang, Jun Long, Hao Zhang, “Turning from TF-IDF to TF-IGM for term weighting in text classification” Expert Systems with Applications, Elsevier Science Ltd., Vol. 66. PP. 245–260, 2016.
[49]. Liangxiao Jiang, ChaoqunLi, ShashaWang, LunganZhang, “Deep feature weighting for naïve Bayesand its application to text classification”, Engineering Applications of Artificial Intelligence, Elsevier Science Ltd., Vol. 52, PP. 26–39, 2016.
[50]. Emre Gungor, Ahmet Ozmen, “Distance and density based clustering algorithm using Gaussian kernel”, Expert Systems With Applications, Elsevier Science Ltd., Vol. 69, PP. 10–20, 2017.
[51]. Saurabh Sharma, Vishal Gupta, “Punjabi Documents Clustering System”, Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 2, May, 2013.