Open Access   Article Go Back

Clustering Incomplete Mixed Datasets by using Extended Squeezer Algorithm and Finding Incomplete Set Mixed Dissimilarity (ISMD)

M.V. Jagannatha Reddy1 , D. Ramachandra Reddy2 , M. Mahesh Kumar3

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-9 , Page no. 432-437, Sep-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i9.432437

Online published on Sep 30, 2018

Copyright © M.V. Jagannatha Reddy, D. Ramachandra Reddy, M. Mahesh Kumar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: M.V. Jagannatha Reddy, D. Ramachandra Reddy, M. Mahesh Kumar, “Clustering Incomplete Mixed Datasets by using Extended Squeezer Algorithm and Finding Incomplete Set Mixed Dissimilarity (ISMD),” International Journal of Computer Sciences and Engineering, Vol.6, Issue.9, pp.432-437, 2018.

MLA Style Citation: M.V. Jagannatha Reddy, D. Ramachandra Reddy, M. Mahesh Kumar "Clustering Incomplete Mixed Datasets by using Extended Squeezer Algorithm and Finding Incomplete Set Mixed Dissimilarity (ISMD)." International Journal of Computer Sciences and Engineering 6.9 (2018): 432-437.

APA Style Citation: M.V. Jagannatha Reddy, D. Ramachandra Reddy, M. Mahesh Kumar, (2018). Clustering Incomplete Mixed Datasets by using Extended Squeezer Algorithm and Finding Incomplete Set Mixed Dissimilarity (ISMD). International Journal of Computer Sciences and Engineering, 6(9), 432-437.

BibTex Style Citation:
@article{Reddy_2018,
author = {M.V. Jagannatha Reddy, D. Ramachandra Reddy, M. Mahesh Kumar},
title = {Clustering Incomplete Mixed Datasets by using Extended Squeezer Algorithm and Finding Incomplete Set Mixed Dissimilarity (ISMD)},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {9 2018},
volume = {6},
Issue = {9},
month = {9},
year = {2018},
issn = {2347-2693},
pages = {432-437},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=2886},
doi = {https://doi.org/10.26438/ijcse/v6i9.432437}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i9.432437}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=2886
TI - Clustering Incomplete Mixed Datasets by using Extended Squeezer Algorithm and Finding Incomplete Set Mixed Dissimilarity (ISMD)
T2 - International Journal of Computer Sciences and Engineering
AU - M.V. Jagannatha Reddy, D. Ramachandra Reddy, M. Mahesh Kumar
PY - 2018
DA - 2018/09/30
PB - IJCSE, Indore, INDIA
SP - 432-437
IS - 9
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
282 279 downloads 180 downloads
  
  
           

Abstract

Clustering mixed datasets is one of the challenging task. Traditional algorithms like k-prototype algorithm is used for mixed dataset, but is limited to only complete datasets. In any dataset missing values are common. To handle such missing values or incomplete mixed datasets we use extended squeezer algorithm, which includes the new dissimilarity measure ISMD that is incomplete set mixed dissimilarity for numerical and categorical attribute values. In this method we consider dissimilarities in the missing values and in this extended squeezer algorithm it not only cluster the incomplete dataset, it also need not to input the missing values and need not to initialize any clusters at the beginning. This method is compared with traditional k-prototype algorithm on benchmark datasets. The experimental results shows that the ISMD using extended squeezer algorithm gives better accuracy than the traditional k-prototype algorithm and also it overcomes the limitation of initial clusters. This method is implemented by using Python programming. The results shows that there is significant improvement in the clustering results.

Key-Words / Index Term

Incomplete set mixed dissimilarity, k-prototype, extended squeezer algorithm, Python programming

References

[1] M.V.Jagannatha Reddy and Dr. B. Kavitha, “clustering mixed numerical and categorical dataset using similarity weight and filter method”, International journal of Database Theory and Applications, vol-5, no-1 March- (2012), pp-121-134
[2] H. Zhexue, “Extension to the K-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, (1998), pp. 283-304.
[3] T. Covões and E. Hruschka, “A study of K-Means-based algorithms for constrained clustering”, Intelligent Data Analysis, vol. 17, no. 3, (2013), pp. 485-505.
[4] H. Zhexue, “Clustering large data sets with mixed numeric and categorical values”, Proceedings of the 1th pacific-Asia Conference on Knowledge Discovery & Data Mining. Singapore: World Scientific, (1997), pp. 21-34.
[5] W. Qian, W. Cheng and F. Zhenyuan, “Summary of k-means clustering algorithm”, Electronic Design Engineering, vol. 20, no. 7, (2012), pp. 21-24.
[6] C. Dan and W. Zhenhua, “A K-prototypes Algorithm Based on Improved Initial Center Points”, Computer Knowledge and Technology, (2010) November.
[7] C. Sotirios, “A fuzzy c-means-type algorithm for clustering of deal with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional”, Expert Systems with Applications, vol. 38, no. 7, (2011), pp. 8684-8689.
[8] W. Fengmei and H. Lixia, “A Missing Data Imputation Method Based on Neighbor Rules”, Computer Engineering, vol. 38, no. 21, (2012).
[9] X. Fang and Z. Guizhu, “Clustering algorithm based on Modified Shuffled Frog Leaping Algorithm and K-means”, Computer Engineering and Applications, vol. 49, no. 1, (2013), pp. 176-180.
[10] Takashi Furukawa, Shin-ichi Ohnishi, and Takahiro Yamanoi “On a Fuzzy c-means Algorithm for Mixed Incomplete Data Using Partial Distance and Imputation” Proceedings of the International MultiConference of Engineers and Computer Scientists 2014 Vol I, IMECS 2014, March 12 - 14, 2014, Hong Kong.
[11] Vaishali H. Umathe, Prof. Gauri Chaudhary. “A Review on Incomplete Data And Clustering” (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (2) , 2015, pp 1225-1227
[12] J. Twisk, M. de Boer, W. de Vente and M. Heymans, “Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis”, Journal of Clinical Epidemiology, vol. 66, no. 9, (2013), pp. 1022-1028.
[13] Wu Sen, Chen Hong and Feng Xiaodong “Clustering algorithm for incomplete data sets with mixed numeric and categorical Attributes” IJDTA, vol. 6 No. 5 2013, pp 95-104.
[14] W. Guoyin, “Expansion in the theory of rough set in incomplete information system”, Journal of computer research and development, vol. 33, no. 10, (2002), pp. 1239-1240.
[15] M..V.Jagannatha Reddy, Dr.B.Kavitha “Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm International Journal of Computer Science and Engineering, E- ISSN:2347-2693, Vol-4, issue-5 pp-36-41 may-16