Performance Comparison of Map Reduce and Apache Spark on Hadoop for  Big Data Analysis

Mantripatjit Kaur and  Gurleen Kaur Dhaliwal

Open Access Article Go Back

Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis

Mantripatjit Kaur¹ , Gurleen Kaur Dhaliwal²

Section:Research Paper, Product Type: Journal Paper
Volume-3 , Issue-11 , Page no. 66-69, Nov-2015

Online published on Nov 30, 2015

Copyright © Mantripatjit Kaur , Gurleen Kaur Dhaliwal . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Mantripatjit Kaur , Gurleen Kaur Dhaliwal, “Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis,” International Journal of Computer Sciences and Engineering, Vol.3, Issue.11, pp.66-69, 2015.

MLA Style Citation: Mantripatjit Kaur , Gurleen Kaur Dhaliwal "Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis." International Journal of Computer Sciences and Engineering 3.11 (2015): 66-69.

APA Style Citation: Mantripatjit Kaur , Gurleen Kaur Dhaliwal, (2015). Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis. International Journal of Computer Sciences and Engineering, 3(11), 66-69.

BibTex Style Citation:
@article{Kaur_2015,
author = {Mantripatjit Kaur , Gurleen Kaur Dhaliwal},
title = {Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2015},
volume = {3},
Issue = {11},
month = {11},
year = {2015},
issn = {2347-2693},
pages = {66-69},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=728},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=728
TI - Performance Comparison of Map Reduce and Apache Spark on Hadoop for Big Data Analysis
T2 - International Journal of Computer Sciences and Engineering
AU - Mantripatjit Kaur , Gurleen Kaur Dhaliwal
PY - 2015
DA - 2015/11/30
PB - IJCSE, Indore, INDIA
SP - 66-69
IS - 11
VL - 3
SN - 2347-2693
ER -

VIEWS	PDF	XML
2523	2322 downloads	2340 downloads

Bar Line

Abstract

With the unremitting advancement of internet and IT, tremendous growth of data has been observed. Data creation occurring at very fast pace, referred as big data, is a trending term these days. Big Data has been the topic of fascination for Computer Science fanatic around the world, and has gained even more prominence in the last few years. This paper scrutinizes the comparison of Hadoop Map Reduce and the newly introduced Apache Spark – both of which are framework for analyzing big data. Although both of these resources are based on the idea of Big Data, their performance varies significantly based on the application under consideration. In this paper two frameworks are being compared along with providing the performance comparison using word count algorithm. In this paper, various datasets has been analyzed over Hadoop Map Reduce and Apache Spark environment for word count algorithm. The system that comes out to be better is further used to analyze the research dataset of a university.

Key-Words / Index Term

Big Data, Hadoop, HDFS, Map Reduce, Apache Spark

References

[1] Jacob,J.P., Basu A,“ Performance analysis of hadoop mapreduce on eucalyptus private cloud” , International Journal of Computer Applications , Vol.17, 2013.
[2] Guanghui, X., Feng, X., Hongxu, M. ,.“ Deploying and Researching Hadoop in Virtual Machines”, Proceeding of the IEEE,International Conference on Automation and Logistics,Zhengzhou, China, 2012.
[3] Ezhilvathani, A., Raja, K.,“Implementation of Parallel Apriori Algorithm on Hadoop Cluster”, IJCSMC, Vol. 2, 2013 pp.513 – 516.
[4] Zaharia,M., Chowdhury, M., Franklin J, Shenker, S., Stoica, I., " Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing". Technical Report UCB/EECS-2011-82, EECS Department, UC Berkeley, 2011.
[5] Peng, W.,, Yan, Q., Hua, Y. “Analysis and Study on the Performance of Query based on NoSQL Database”, Computer modelling & new technologies , 2014, pp.153-159 .
[6] Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.,. “G-Hadoop: MapReduce across distributed data centers for data-intensive computing” , Parallel and Distributed Processing Symposium Workshops and Phd Forum ,IEEE 26th International , 2012, pp.2004-2011.
[7] Rao,B.T., Sridevi N.V.,Reddy V.K., Reddy L.S.S.“Performance Issues of Heterogeneous Hadoop Clusters in Cloud Computing”, Global Journal of Computer Science and Technology ,2011,Vol.11, Issue 8.
[8] Pradeepa, A., Thanamani, A.S. “ Hadoop file system and fundamental concept of mapreduce interior and closure rough set approximations”, International Journal of Advanced Research in Computer and Communication Engineering ,Vol. 2, Issue 10, 2013.
[9] Lee, C., Hseieha, K., Hsieha, S., Hsia, H.“ A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments," Big Data Research, Vol. 1, 2014, pp.14–22.
[10] “Hadoop in Action” by Chuck Lam.
[11] White, Tom, 2011.“Hadoop the definitive guide” O’ Reilly media, Inc., CA.
[12] SBPU University Research Dataset: http://www.unipune.ac.in/dept/mental_moral_and_social_science/politics_and_public_administration/ppa_webfiles/pdf/new11/Link_Archives_PhDThesisList2011.pdf
[13] Apache Spark, http://spark.apache.org/
[14] Amp Lab web page : https:// amplab.cs.berkeley.edu/projects/spark- lightning-fast-cluster-computing
[15] http://www.gutenberg.org/ebooks/2600

Citations	2325
h-index	16
i10-index	47