Open Access   Article Go Back

Missing Data Imputation to Measure Statistic for Data Mining Applications

Shahid Ali Khan1 , Praveen Dhyani2

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-5 , Page no. 1215-1220, May-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i5.12151220

Online published on May 31, 2019

Copyright © Shahid Ali Khan, Praveen Dhyani . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Shahid Ali Khan, Praveen Dhyani, “Missing Data Imputation to Measure Statistic for Data Mining Applications,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.5, pp.1215-1220, 2019.

MLA Style Citation: Shahid Ali Khan, Praveen Dhyani "Missing Data Imputation to Measure Statistic for Data Mining Applications." International Journal of Computer Sciences and Engineering 7.5 (2019): 1215-1220.

APA Style Citation: Shahid Ali Khan, Praveen Dhyani, (2019). Missing Data Imputation to Measure Statistic for Data Mining Applications. International Journal of Computer Sciences and Engineering, 7(5), 1215-1220.

BibTex Style Citation:
@article{Khan_2019,
author = {Shahid Ali Khan, Praveen Dhyani},
title = {Missing Data Imputation to Measure Statistic for Data Mining Applications},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2019},
volume = {7},
Issue = {5},
month = {5},
year = {2019},
issn = {2347-2693},
pages = {1215-1220},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4389},
doi = {https://doi.org/10.26438/ijcse/v7i5.12151220}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i5.12151220}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4389
TI - Missing Data Imputation to Measure Statistic for Data Mining Applications
T2 - International Journal of Computer Sciences and Engineering
AU - Shahid Ali Khan, Praveen Dhyani
PY - 2019
DA - 2019/05/31
PB - IJCSE, Indore, INDIA
SP - 1215-1220
IS - 5
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
217 213 downloads 93 downloads
  
  
           

Abstract

In the applications of data mining, finding a association amongst a number of datasets is an essential concern to be focused. Correlation is generally employed in a statistical tool that supports in computing the association amongst datasets. The correlation coefficient supports in determining the strength in addition to the direction amongst two datasets and generally utilized in the real-valued datasets. In huge databases, there are various fields with mixed data types, like real, nominal and ordinal possesses values of missing information. In this paper, an effort has been made for computing the correlation coefficient between real-valued and nominal-valued dataset with missing values.

Key-Words / Index Term

Data Mining, Real-valued data, Nominal-Valued data and Missing values

References

[1] Fayyad U M, Piatetsky-Shapiro G, and Smyth P, ”Advance in Knowledge Discovery and Data Mining” ,1-34, Menlo Park, CA:AAAI Press/MIT Press, 1996a.
[2] Berry M J A and Linoff G, “Data Mining Techniques for Marketing, Sales and Customer Support” , NY: John Wiley and Sons, 1997.
[3] Hand D, Mannila H, and Smyth P, “Principles of Data Mining”, Prentice-Hall of India Private Limited, India, 2001.
[4] Han J and Kamber M, “Data Mining: Concepts and Techniques” , San Francisco, Morgan Kauffmann Publishers, 2001.
[5] Dunham M H, “Data Mining: Introductory and Advanced Topics”, 1st Edition Pearson Education (Singapore) Pte. Ltd., 2003.
[6] Rayward-Smith, V. J., “Statistics to measure correlation for data mining applications”, Computational Statistics & Data Analysis, 51(8), 3968–3982. doi:10.1016/j.csda.2006.05.025, 2007.
[7] Hong, T.P., Wu, C.W., “Mining Rules from an incomplete dataset with a high missing rate”, Expert Systems with Applications, 38(4), 3931–3936. doi:10.1016/j.eswa. 2010.09.054. 2011.
[8] Ferrari P.A., Annoni P., Barbiero A., Manzi G.,” An imputation method for categorical variables with application to nonlinear principal component analysis”, Computational Statistics & Data Analysis, 55(7), 2410–2420. doi:10.1016/j.csda.2011.02.007.
[9] Judi Scheffer, “Dealing with Missing Data”, Res. Lett.Inf. Math.Sci (2002). Quad A, Massey University, P.O. Box 102904 N.S.M.C, Auckland, 1310.
[10] Rubin D.B., “Inference and missing data”, Biometrika 63(3):581–592,1976.
[11] Statistical Services of University of Texas, “ Handling missing or incomplete data” ,2000.
[12] Combes, C., Meskens, N., Rivat, C., & Vandamme, J. P.,” Using a KDD process to forecast the duration of surgery”, International Journal of Production Economics, 112(1), 279-293, 2008.
[13] Spiliopoulou, M., & Pohle, C.,” Data mining for measuring and improving the success of web sites”, Data Mining and Knowledge Discovery, 5(1-2), 85-114, 2001.
[14] Liao, S. H., Chu, P. H., & Hsiao, P. Y. ,” Data mining techniques and applications–A decade review from 2000 to 2011”, Expert Systems with Applications, 39(12), 11303-11311,2012.
[15] Sharma, R., Magnani, M., &Montesi, D.,” Investigating the types and effects of missing data in multilayer networks”, In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 392-399). ACM, 2015.
[16] Kossinets, G.,”Effects of missing data in social networks”, Social networks, 28(3), 247-268,2006.
[17] Jolliffe, I.T. ,”Principal Component Analysis”, Springer, Berlin,1986.
[18] Jain, A.K., Murty, M.N., Flynn, P.J., “Data clustering: a review”, ACM Comput. Surveys 31 (3), 264–323,1999.
[19] Al-Harbi, S.H., McKeown, G.P., Rayward-Smith, V.J., “A innovative metric for categorical data”, In: Bozdogan, H. (Ed.), Statistical Data Mining and Applications,2003.
[20] Shahid Ali Khan, Praveen Dhyani, ”Data Mining: Using C++ to Measure Correlation between Real-Valued and Nominal Valued datasets," International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: 2347-5552, Volume-3, Issue-2, March 2015.