Open Access   Article

Machine Learning Algorithms in Big data Analytics

K. Sree Divya1 , P. Bhargavi2 , S. Jyothi3

1 Department of Computer Science, Sri Padmavathi Mahila Viswavidhyalayam, Tirupati, India.
2 Department of Computer Science, Sri Padmavathi Mahila Viswavidhyalayam, Tirupati, India.
3 Department of Computer Science, Sri Padmavathi Mahila Viswavidhyalayam, Tirupati, India.

Correspondence should be addressed to: .

Section:Review Paper, Product Type: Journal Paper
Volume-6 , Issue-1 , Page no. 63-70, Jan-2018


Online published on Jan 31, 2018

Copyright © K. Sree Divya, P. Bhargavi, S. Jyothi . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library


IEEE Style Citation: K. Sree Divya, P. Bhargavi, S. Jyothi, “Machine Learning Algorithms in Big data Analytics”, International Journal of Computer Sciences and Engineering, Vol.6, Issue.1, pp.63-70, 2018.

MLA Style Citation: K. Sree Divya, P. Bhargavi, S. Jyothi "Machine Learning Algorithms in Big data Analytics." International Journal of Computer Sciences and Engineering 6.1 (2018): 63-70.

APA Style Citation: K. Sree Divya, P. Bhargavi, S. Jyothi, (2018). Machine Learning Algorithms in Big data Analytics. International Journal of Computer Sciences and Engineering, 6(1), 63-70.

333 356 downloads 83 downloads


Big data is a wonderful supply of information and knowledge from the systems to other end users. However handling such quantity of knowledge needs automation, and this leads to a trend of data processing and machine learning techniques. Within the ICT sector, as in several different sectors of analysis and trade, platforms and tools are being served and developed to assist professionals to treat their knowledge and learn from it automatically. Most of these platforms return from huge firms like Google or Microsoft, or from incubators at the Apache Foundation. This review explains Machine learning Algorithms in Big data Analytics, and machine learning challenges us to take decisions where there is no known “right path” for the specific problem based on previous lessons and enumerates some of the foremost used tools for analyzing and modeling big-data.

Key-Words / Index Term

Machine Learning Algorithms, Big data Analytics, Apache Foundation


[1]. M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, pp. 255-260, 2015.
[2]. Hey, T., Tansley, S., Tolle, K., editors (2009). The Fourth Paradigm: Data- Intensive Scientific Discovery. Microsoft Research. S.Hasan et a.l
[3]. Sun, Y. et al., 2014. Organizing and Querying the Big Sensing Data with Event-Linked Network in the Internet of Things. International Journal of Distributed Sensor Networks, 14, p.11.
[4]. Fan, J., Han, F. & Liu, H., 2014. Challenges of Big Data analysis. National Science Review , 1 (2 ), pp.293– 314.
[5]. Parmar, V. & Gupta, I., 2015. Big data analytics vs Data Mining analytics. IJITE, 3(3), pp.258–263.
[6]. K. Sayood, Introduction to Data Compression, Morgan Kaufinarm Publishers, San Francisco, CA, 2000.
[7]. M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, pp. 255-260, 2015.
[8]. N. Japkowicz and M. Shah, Evaluating Learning Algorithms: A Classification Perspective: Cambridge University Press, 2011.
[9]. S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 3rd ed.: Prentice Hall, 2010.
[10]. Y.Bengio ,A.Courville, and P. Vincent, "Representation learning: A review and new perspectives," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, 2013.
[11]. T. Yu, "Incorporating Prior Domain Knowledge into Inductive Machine Learning,"Computing Sciences, University of Technology Sydney, Sydney, Augtralia, 2007.
[12]. Q. Chen, J. Zobel, and K. Verspoor, "Evaluation of a Machine Learning Duplicate Detection Method for Bioinformatics Databases," in Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics, 2015, pp. 4- 12.
[13]. T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, et al., "Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 7, p. 10, 09/01/2013 2013.
[14]. J. J. Pfeiffer III, J. Neville, and P. N. Bennett, "Overcoming Relational Learning Biases to Accurately Predict Preferences in Large Scale Networks," in Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 853-863.
[15]. L. Cao, M. Wei, D. Yang, and E. A. Rundensteiner, "Online Outlier Exploration Over Large Datasets," in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 89-98.
[16]. G. Cavallaro, M. Riedel, M. Richerzhagen, J. A. Benediktsson, and A. Plaza, "On Understanding Big Data Impacts in Remotely Sensed Image Classification Using Support Vector Machine Methods," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, pp. 4634-4646, 2015.
[17]. 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM)-2017
[18].J. Zhu, J. Chen, and W. Hu. (2014, 2014/11/24). Big Learning with Bayesian Methods. Available:
[19]. Triguero, D. Peralta, J. Bacardit, S. García, and F. Herrera, "MRPR: A MapReduce solution for prototype reduction in big data classification," Neurocomputing, vol. 150, Part A, pp. 331-345, 2/20/ 2015.
[20]. M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar,
N.Seliya, R. Wald, and E. Muharemagic, "Deep learning applications and challenges in big data analytics," Journal
of Big Data, vol. 2, pp. 1-21, 2015.
[21]. Lin Li, S. Bagheri, H. Goote, A. Hasan, and G. Hazard, "Risk adjustment of patient expenditures: A big data analytics approach," in 2013 IEEE International Conference on Big Data, 2013.
[22]. G. Zhang, S.-X. Ou, Y.-H. Huang, and C.-R. Wang, "Semi-supervised learning methodsfor large scale healthcare data analysis," International Journal of Computers inHealthcare, vol. 2, pp. 98-110, 06/01/2015 2015.
[23]. J. Suzuki, H. Isozaki, and M. Nagata, "Learning condensed feature representations fromlarge unsupervised data sets for supervised learning," in Proceedings of the 49th AnnualMeeting of the Association for Computational Linguistics: Human LanguageTechnologies: short papers - Volume 2, 2011, pp. 636-641.
[24]. Improving deep neural network design with new text data representations Joseph D. Prusa* andTaghi M. Khoshgoftaar J Big Data (2017) 4:7DOI 10.1186/s40537-017-0065-8
[25]. A survey of transfer learning Karl Weiss* , Taghi M. Khoshgoftaar and DingDing Wang Weiss et al. J Big Data (2016) 3:9 DOI 10.1186/s40537-016-0043-6
[26]. S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in Action: Manning Publications Co., 2011.
[27]. White, T. (2009). Hadoop: The Definitive Guide (1st edition). O`Reilly Media, Inc. Software available from
[28]. Zaharia, M.; Chowdhury, M.; Franklin, M.J.; Shenker, S. & Stoica, I. (2010). Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. USENIX Association, Berkeley, CA, USA. Software available from