Open Access   Article Go Back

Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis

Mantripatjit Kaur1 , Gurleen Kaur Dhaliwal2

Section:Research Paper, Product Type: Journal Paper
Volume-3 , Issue-11 , Page no. 66-69, Nov-2015

Online published on Nov 30, 2015

Copyright © Mantripatjit Kaur , Gurleen Kaur Dhaliwal . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Mantripatjit Kaur , Gurleen Kaur Dhaliwal, “Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis,” International Journal of Computer Sciences and Engineering, Vol.3, Issue.11, pp.66-69, 2015.

MLA Style Citation: Mantripatjit Kaur , Gurleen Kaur Dhaliwal "Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis." International Journal of Computer Sciences and Engineering 3.11 (2015): 66-69.

APA Style Citation: Mantripatjit Kaur , Gurleen Kaur Dhaliwal, (2015). Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis. International Journal of Computer Sciences and Engineering, 3(11), 66-69.

BibTex Style Citation:
@article{Kaur_2015,
author = {Mantripatjit Kaur , Gurleen Kaur Dhaliwal},
title = {Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2015},
volume = {3},
Issue = {11},
month = {11},
year = {2015},
issn = {2347-2693},
pages = {66-69},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=728},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=728
TI - Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis
T2 - International Journal of Computer Sciences and Engineering
AU - Mantripatjit Kaur , Gurleen Kaur Dhaliwal
PY - 2015
DA - 2015/11/30
PB - IJCSE, Indore, INDIA
SP - 66-69
IS - 11
VL - 3
SN - 2347-2693
ER -

VIEWS PDF XML
2523 2322 downloads 2340 downloads
  
  
           

Abstract

With the unremitting advancement of internet and IT, tremendous growth of data has been observed. Data creation occurring at very fast pace, referred as big data, is a trending term these days. Big Data has been the topic of fascination for Computer Science fanatic around the world, and has gained even more prominence in the last few years. This paper scrutinizes the comparison of Hadoop Map Reduce and the newly introduced Apache Spark – both of which are framework for analyzing big data. Although both of these resources are based on the idea of Big Data, their performance varies significantly based on the application under consideration. In this paper two frameworks are being compared along with providing the performance comparison using word count algorithm. In this paper, various datasets has been analyzed over Hadoop Map Reduce and Apache Spark environment for word count algorithm. The system that comes out to be better is further used to analyze the research dataset of a university.

Key-Words / Index Term

Big Data, Hadoop, HDFS, Map Reduce, Apache Spark

References

[1] Jacob,J.P., Basu A,“ Performance analysis of hadoop mapreduce on eucalyptus private cloud” , International Journal of Computer Applications , Vol.17, 2013.
[2] Guanghui, X., Feng, X., Hongxu, M. ,.“ Deploying and Researching Hadoop in Virtual Machines”, Proceeding of the IEEE,International Conference on Automation and Logistics,Zhengzhou, China, 2012.
[3] Ezhilvathani, A., Raja, K.,“Implementation of Parallel Apriori Algorithm on Hadoop Cluster”, IJCSMC, Vol. 2, 2013 pp.513 – 516.
[4] Zaharia,M., Chowdhury, M., Franklin J, Shenker, S., Stoica, I., " Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing". Technical Report UCB/EECS-2011-82, EECS Department, UC Berkeley, 2011.
[5] Peng, W.,, Yan, Q., Hua, Y. “Analysis and Study on the Performance of Query based on NoSQL Database”, Computer modelling & new technologies , 2014, pp.153-159 .
[6] Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.,. “G-Hadoop: MapReduce across distributed data centers for data-intensive computing” , Parallel and Distributed Processing Symposium Workshops and Phd Forum ,IEEE 26th International , 2012, pp.2004-2011.
[7] Rao,B.T., Sridevi N.V.,Reddy V.K., Reddy L.S.S.“Performance Issues of Heterogeneous Hadoop Clusters in Cloud Computing”, Global Journal of Computer Science and Technology ,2011,Vol.11, Issue 8.
[8] Pradeepa, A., Thanamani, A.S. “ Hadoop file system and fundamental concept of mapreduce interior and closure rough set approximations”, International Journal of Advanced Research in Computer and Communication Engineering ,Vol. 2, Issue 10, 2013.
[9] Lee, C., Hseieha, K., Hsieha, S., Hsia, H.“ A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments," Big Data Research, Vol. 1, 2014, pp.14–22.
[10] “Hadoop in Action” by Chuck Lam.
[11] White, Tom, 2011.“Hadoop the definitive guide” O’ Reilly media, Inc., CA.
[12] SBPU University Research Dataset: http://www.unipune.ac.in/dept/mental_moral_and_social_science/politics_and_public_administration/ppa_webfiles/pdf/new11/Link_Archives_PhDThesisList2011.pdf
[13] Apache Spark, http://spark.apache.org/
[14] Amp Lab web page : https:// amplab.cs.berkeley.edu/projects/spark- lightning-fast-cluster-computing
[15] http://www.gutenberg.org/ebooks/2600