Open Access   Article Go Back

Significance of learning methods for mining of real time data streams

E.Padmalatha 1 , S.Sailekya 2

  1. Dept. of CSE, CBIT, Bvrith, India.
  2. Dept. of CSE, CBIT, Bvrith, India.

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-3 , Page no. 188-209, Mar-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i3.188209

Online published on Mar 30, 2018

Copyright © E.Padmalatha, S.Sailekya . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: E.Padmalatha, S.Sailekya, “Significance of learning methods for mining of real time data streams,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.3, pp.188-209, 2018.

MLA Style Citation: E.Padmalatha, S.Sailekya "Significance of learning methods for mining of real time data streams." International Journal of Computer Sciences and Engineering 6.3 (2018): 188-209.

APA Style Citation: E.Padmalatha, S.Sailekya, (2018). Significance of learning methods for mining of real time data streams. International Journal of Computer Sciences and Engineering, 6(3), 188-209.

BibTex Style Citation:
@article{_2018,
author = {E.Padmalatha, S.Sailekya},
title = {Significance of learning methods for mining of real time data streams},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2018},
volume = {6},
Issue = {3},
month = {3},
year = {2018},
issn = {2347-2693},
pages = {188-209},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1783},
doi = {https://doi.org/10.26438/ijcse/v6i3.188209}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i3.188209}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1783
TI - Significance of learning methods for mining of real time data streams
T2 - International Journal of Computer Sciences and Engineering
AU - E.Padmalatha, S.Sailekya
PY - 2018
DA - 2018/03/30
PB - IJCSE, Indore, INDIA
SP - 188-209
IS - 3
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
505 309 downloads 286 downloads
  
  
           

Abstract

Stream Data is now more than ever highly distributed, loosely structured, increasingly large in volume and changing over time. Broadly speaking, firstly the volume of data increasing exponentially each year and secondly the speed at which the new data is being generated of distinct concept and changes over time. Stream Data is generated by number of sources. Data streaming applications are typically dealing with large amounts of data over an extended period of time. However, in most cases the user is only interested in recent data instead of the whole data set. Furthermore, stream data tends to express features of a concept drift, i.e. the data is evolving over time. This would cause algorithms which consider the whole data set with the same importance to produce distorted results. In such cases the majority of processed data would not be valid anymore. Sometimes the nature of a data stream itself requires giving up a certain amount of precision because its high volume couldn’t be processed otherwise and one would end up with no information at all. If the data distribution is stable, mining a data stream is largely the same as mining a large data set, since statistically it is easily to mine a sufficient sample. The expectations of mining data streams are finding and understanding changes, maintaining an updated model. For evolving data, two classes of problems are of particular interest: model maintenance and change detection. The goal of model maintenance is to maintain a data mining model under inserts and deletes of blocks of data. In this model, older data is available if necessary. Change detection is related to quantify the difference between two sets of data and determine when the change has statistical significance. Data streams can be seen as stochastic processes in which events occur continuously and independently from each another [1]. Querying data streams is quite different from querying in the conventional relational model. A key idea is that operating on the data stream model does not preclude the use of data in conventional stored relation, data might be transient. In this paper proposed methods are addressing Classification of balanced and unbalanced data streams by considering concept drift and data skewness. The classification accuracy depends on the selection of learning model. In data streams at the time of classification ,concept drift plays the vital role .Comparing to traditional classification data stream classification needs more accurate methods .Because traditional methods always follows the training model which may not predict the novel classes. In data streams by considering the concept drift with unsupervised learning model can predict the novel class. In the proposed methodology classification of data streams are addressed by ensemble methods with supervised learning, unsupervised learning for novel class detection to increases the accuracy of the system. A scalable and adaptable online genetic algorithm is proposed to mine classification rules for the largest data streams with concept drifts. The data skewness is addressed by considering the data level, the algorithmic level to favor the positive class.

Key-Words / Index Term

Data Mining

References

[1]G. Hulten, L. Spencer, and P. Domingos , Mining Time-Changing Data Streams,‖ Proc. Seventh ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD ’01), pp. 97-106, 2001.
[2] D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI repository of machine learning databases, 1998.
[3] C.X. Ling and V.S. Sheng. Cost-sensitive learning and the class imbalance problem. Encyclopedia of Machine Learning, 2008.
[4]Pedro Domingos, Geoff Hulten, “ Mining High Speed Data Streams”,KDD-00 in proceeding of sixth ACM SIGKDD international conference on knowledge discovery and data mining, USA, 2000, pp 71-80.
[5]Leo Breiman (2001). Random orests.Machine Learning. 45(1):5-32.
[6]J.C.Schimmer and R.H.Ganger Beyond incremental processing :Tracking Concept Drift .In proceedings of the fifth National conference on Artificial Intelligence .pages 502-507 AAAI press ,Menlo park ,CA,1986.
[7] N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent data analysis,6(5):429{449, 2002.
[8] W. Nick Street and Yong Seog Kim. A Streaming Ensemble Algorithm (SEA) for Large- Scale Classification. KDD – 01. San Francisco, CA.
[9] D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI repository of machine learning databases, 1998.
[10]. Dougherty, J., R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of International Conference on Machine Learning (ICML-1995), 1995.
[11] Pedro Domingos, Geoff Hulten, “ Mining High Speed Data Streams”,KDD-00 in proceeding of sixth ACM SIGKDD international conference on knowledge discovery and data mining, USA, 2000, pp 71-80.
[12] A. Tsymbal. “The problem of concept drift: definitions and related work”, Technical Report TCD-CS-2004-15, Computer Science Department, Trinity College Dublin, Ireland. 2004.
[13] W. Nick Street and Yong Seog Kim. A Streaming Ensemble Algorithm (SEA) for Large- Scale Classification. KDD – 01. San Francisco, CA.
[14] E Padmalatha, C R K Reddy and Padmaja B Rani. Article: Ensemble Classification for Drifting Concept. International Journal of Computer Applications 80(11):33-36, October 2013.
[15] E.Padmalatha,C.R.K.Reddy, B.Padmaja Rani ”Classification of Concept Drift Data Streams”In the proceedings of the Fifth International Conference on Information Science and Applications .ICISA 2014.IEEE PP291-295, 2014.
[16] Periasamy Vivekanandan and Raju Nedunchezhian, “Mining data streams with concept drifts using genetic algorithm”, Artificial Intelligence Review, Vol. 36, Issue 3, pp 163-178, Springer, October 2011.
[17] Basheer M. Al-Maqaleh and Hamid Shahbazkia, “A Genetic Algorithm for Discovering Classification Rules in Data Mining”, International Journal of Computer Applications (0975-8887), Vol. 41-No. 18, March 2012.
[18] Syed Shaheena and Shaik Habeeb, “Classification Rule Discovery Using Genetic Algorithm-Based Approach”, NIMRA Institute, Department of CSE, IJCTT Journal, Vol. 4, Issue 8, pp 2710-2715, August 2013.
[19] E Padmalatha, C R K Reddy and Padmaja B Rani. Article: Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm. International Journal of Computer Applications 125(15):1-6, September 2015.
[20]Wei Liu, Sanjay Chawla, David A. Cieslak, Nitesh V. Chawla, ― A Robust Decision Tree Algorithm for Imbalanced Data Sets‖, 2010.
[21]Xu-Ying Liu, Jianxin Wu, Zhi-Hua Zhou” Exploratory Undersampling for Class-Imbalance Learning”, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 2, APRIL 2009, pp.no:539 – 550.
[22] X.Y. Liu, J. Wu, and Z.H. Zhou. Exploratory undersampling for class-imbalance learning. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 39(2):539{550, 2009. [12]12Data Mining: Concepts and Techniques. J. Han and M. Kamber. Morgan Kaufmann, 2000.
[23] Junfeng Pan and Qiang Yang, Yiming Yang and Lei Li, Frances Tianyi Li and George Wenmin Li “Cost-SensitiveData Preprocessing for Mining Customer Relationship Management Databases”, JANUARY/FEBRUARY 2007, A Technical Report.
[24] J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multi-label classification. In Proceedings of ECML PKDD ’09, pages 254–269, 2009.
[25] ]Macskassy, S.A. and Provost, F.J., “Confidence Bands for ROC Curves,” CeDER Working Paper 02-04, Stern School of Business, New York University, NY, NY 10012. Jan 2004.