Open Access   Article Go Back

Clustering Algorithms Validated Using Relative Index Validation

T. Senthil Selvi1 , R. Parimala2

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-10 , Page no. 85-95, Oct-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i10.8595

Online published on Oct 31, 2018

Copyright © T. Senthil Selvi, R. Parimala . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: T. Senthil Selvi, R. Parimala, “Clustering Algorithms Validated Using Relative Index Validation,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.10, pp.85-95, 2018.

MLA Style Citation: T. Senthil Selvi, R. Parimala "Clustering Algorithms Validated Using Relative Index Validation." International Journal of Computer Sciences and Engineering 6.10 (2018): 85-95.

APA Style Citation: T. Senthil Selvi, R. Parimala, (2018). Clustering Algorithms Validated Using Relative Index Validation. International Journal of Computer Sciences and Engineering, 6(10), 85-95.

BibTex Style Citation:
@article{Selvi_2018,
author = {T. Senthil Selvi, R. Parimala},
title = {Clustering Algorithms Validated Using Relative Index Validation},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {10 2018},
volume = {6},
Issue = {10},
month = {10},
year = {2018},
issn = {2347-2693},
pages = {85-95},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=2986},
doi = {https://doi.org/10.26438/ijcse/v6i10.8595}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i10.8595}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=2986
TI - Clustering Algorithms Validated Using Relative Index Validation
T2 - International Journal of Computer Sciences and Engineering
AU - T. Senthil Selvi, R. Parimala
PY - 2018
DA - 2018/10/31
PB - IJCSE, Indore, INDIA
SP - 85-95
IS - 10
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
454 283 downloads 231 downloads
  
  
           

Abstract

Clustering pertains to the task of finding out groups of objects such that the objects of one group are dissimilar from other groups and is similar within the same group. This work uses feature selection technique like the Document frequency Feature selection (DFFS) and feature extraction techniques like Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) were it constructs a small set of features from the original features. The newly constructed features run the K-Means algorithm without any loss of information. On several runs evaluate the accuracy for the clustering algorithms and record the results. For the obtained results, determine the cluster validation. Internal validation measures are employed to evaluate for cluster validation, based on these measures the relative validation measure is employed to determine the best clustering algorithm. Experiments are conducted for various benchmark datasets comprising of unlabelled documents and the final results prove to show that DFFS, KPCA followed by K-Means algorithm gives the best clustering results of accuracy.

Key-Words / Index Term

Clustering,RelativeValidityMeasures,PCA,KPCA

References

[1] K.P. Agrawal, S.Garg, P. Patel, "Performance Measures for Densed and Arbitrary Shaped Cluster", International Journal of Computer Science & Communication, vol 6, no.2, pp.338-350, 2015.
[2] Y. Liu, Z. Li, H. Xiong, X. Gao, J. Wu,"Understanding of Internal Clustering Validation Measure", 2010 IEEE InternationalConference on Data Mining Australia, pp.911-916, 2010.
[3] S. Saitta, B. Raphael, I.F.C. Smith, "A Bounded Index for Cluster Validity", Machine Learning and Data Mining in Pattern Recognition, Springer, Heidelberg, LNAI.4571, no.1, pp.174-187, 2007
[4] Mustakim, "Centroid K-Means clustering Optimization using Eigen vector principal component analysis", Journal of Theoretical and Applied InformationTechnology , vol.95, no.15, pp.3534-3542, 2017
[5] C. Legany, S. Juhasz, A. Babos, "Cluster Validity Measurement Techniques", Proceedings of the 5th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases, Spain, pp.388-393, 2006.
[6] T. Karkkainen, S.Jauhiainen, "A Simple Cluster Validation Index with Maximal Coverage", ESANN 2017 proceedingsEuropean Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning , i6doc.com publ, Belgium, pp.293-298, 2017.
[7] L.J.Deborah, R.Baskaran, A.Kannan, "A Survey on Internal Validity Measure for ClusterValidation", International Journal of Computer Science & Engineering Survey (IJCSES), vol.1, no.2, pp.85-102, 2010
[8] S.Jauhiainen,J.Hamalainen, T.Karkkainen, "Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering Algorithms", ArticleAlgorithms , vol.10, no.105, pp.1-14, 2017.
[9] M. Charrad,Y. Lechevallier, M.B. Ahmed, G. Saporta, ”On the Number of Clusters in Block Clustering Algorithms", Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS 2010), pp.392-397, Florida
[10]J.Baarsch, M. EmreCelebi, "Investigation of Internal Validity Measures for K-Means Clustering", Proceedings of the Intn. Multiconference of Engineers and computer scientist, Hongkong, vol 1, 2012.
[11]A.Thalamuthu, I.Mukhopadhyay, X. Zheng, G.C. Tseng, "Evaluation and comparison of gene clustering method in microarray analysis", Bioinformatics, vol.22, no.19, pp.2405-2412, 2006.
[12]J.Schultz, L.Hubert, "Quadratic assignment as a general data analysis strategy", British Journal of Mathematical and Statistical Psychology, vol.29, no.2, pp.190-241,1976.
[13] D.W.Bouldin, D. L. Davies, "A cluster separation measure", IEEE Transaction on Pattern Analysis and Machine Intelligence PAMI-1,vol.3, no.2, pp.224-227, 1979.
[14]M. Halkidi, Y.Batistakis, M.Vazirgiannis, “Quality Scheme Assessment in the Clustering Process”, Proc. of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp.265-276, 2000
[15]M.Vazirgiannis, M.Halkidi, "Clustering validity assessment:Finding the optimal partitoning of a data set", Proceedings IEEE International Conference on data Mining, USA, pp.187-194, 2001.
[16] T. Harabasz, J. Calinski , "A dendrite method for cluster analysis", Communications in Statistics, vol.3, no.1, pp.1-27, 1974
[17] J.Dunn, "Well separated clusters and optimal fuzzy partitions", Journal of Cybernetics, vol.4, no.1, pp.95-104, 1974
[18] F. B Baker, L. J.Hubert, "Measuring the power of hierarchical cluster analysis", Journal of the American Statistical Association, vol.70, no.349, pp.31-38, 1975
[19] P.J.Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis", Journal of Computaional and Applied mathematics, vol.20, pp.53-65, 1987
[20] T.SenthilSelvi, R.Parimala, "Improving Clustering Accuracy using Feature Extraction Method", International Journal of Scientific Research in Computer Science and Engineering (isroset) ,vol.6, no. 2, pp.15-19, 2018.
[21]R Core Team, “R: A language and environment for statistical computing”, R Foundation for Statistical Computing, Vienna, Austria, pp.1-2673, 2018, https://www.R-project.org/.
[22] B. Desgraupes, ” clusterCrit: Clustering Indices”, R package, pp.1-34, 2018, https://CRAN.Rproject.org/package=clusterCrit