Open Access   Article Go Back

Comparative Study of Big Data Technologies and Frameworks

Mayank Tripathi1 , A. K. Agarwal2

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-8 , Page no. 488-495, Aug-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i8.488495

Online published on Aug 31, 2018

Copyright © Mayank Tripathi, A. K. Agarwal . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Mayank Tripathi, A. K. Agarwal, “Comparative Study of Big Data Technologies and Frameworks,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.8, pp.488-495, 2018.

MLA Style Citation: Mayank Tripathi, A. K. Agarwal "Comparative Study of Big Data Technologies and Frameworks." International Journal of Computer Sciences and Engineering 6.8 (2018): 488-495.

APA Style Citation: Mayank Tripathi, A. K. Agarwal, (2018). Comparative Study of Big Data Technologies and Frameworks. International Journal of Computer Sciences and Engineering, 6(8), 488-495.

BibTex Style Citation:
@article{Tripathi_2018,
author = {Mayank Tripathi, A. K. Agarwal},
title = {Comparative Study of Big Data Technologies and Frameworks},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {8 2018},
volume = {6},
Issue = {8},
month = {8},
year = {2018},
issn = {2347-2693},
pages = {488-495},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=2721},
doi = {https://doi.org/10.26438/ijcse/v6i8.488495}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i8.488495}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=2721
TI - Comparative Study of Big Data Technologies and Frameworks
T2 - International Journal of Computer Sciences and Engineering
AU - Mayank Tripathi, A. K. Agarwal
PY - 2018
DA - 2018/08/31
PB - IJCSE, Indore, INDIA
SP - 488-495
IS - 8
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
344 201 downloads 195 downloads
  
  
           

Abstract

The organization`s hunger for data insights and the adaptation of the World Wide Web has increased exponentially the generation and collection speed of data. There is a challenge to capture, store and analyze this large set of unstructured data, which have taken the shape of Big Data. In this paper, the definition of Big Data is introduced from different aspects to comprehend its concept. The architecture of Big Data is analyzed to study the processing mechanism of Big Data. The various Big Data technologies like Hadoop, HBase, Map Reduce, Pig, Hive, Sqoop, and Flume are studied and compare based on features supported by them. A comprehensive study of frameworks like Apache Spark, Cloudera, and Hortonworks used for execution of Big Data technologies is done by highlighting their important features. This paper also represents how data related to fields like the Stock market, Agriculture, Medical Health Records, and Internet traffic is stored, processed and analyzed using Big Data technologies and frameworks.

Key-Words / Index Term

Big Data; Hadoop; MapReduce; HBase; Sqoop; Flume; Apache Spark; Cloudera; Hortonworks

References

[1] 3pillarglobal.com, How to Analyze Big Data with Hadoop Technologies [Online], Available: http://www.3pillarglobal.com/ and http://www.3pillarglobal.com/insights/analyze-big-data-hadoop-technologies (accessed on 11 April 2018)
[2] Er. Rupinder Kaur, Raghu Garg, Dr Himanshu Aggarwal, Big Data Analytics Framework to Identify Crop Disease and Recommendation a Solution, IEEE, International Conference on Inventive Computation Technologies (ICICT), volume 2, 2016.
[3] Haritha Chennamsetty, Suresh Chalasani, Derek Riley, Predictive Analytics on Electronic Health Records (EHRs) using Hadoop and Hive, IEEE, International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2015.
[4] Abdeltawab M. Hendawi, Fatemah Alali, Xiaoyu Wang, Yunfei Guan, Tianshu Zhou, Xiao Liu, Nada Basit, John A. Stankovic, Hobbits: Hadoop and Hive Based Internet Traffic Analysis, IEEE, International Conference on Big Data (Big Data), 2016.
[5] J. Gantz and D. Reinsel, Extracting value from chaos, in Proc. IDC iView, pp. 1–12, 2011.
[6] J. Manyika et al, Big Data: The Next Frontier for Innovation Competition, and Productivity, San Francisco, CA, USA: McKinsey Global Institute, pp. 1–37, 2011.
[7] M. Cooper and P. Mell (2012), Tackling Big Data [Online], Available: http://csrc.nist.gov/groups/SMA/forum/documents/june2012presentations/fcsm_june2012_cooper_mell.pdf (accessed on 13 May 2018)
[8] G. Blackett (2013), Analytics Network-O.R. Analytics [Online], Available: http://www.theorsociety.com/Pages/SpecialInterest/AnalyticsNetwork_analytics.aspx (accessed on 13 May 2018)
[9] Palanisamy, B. Singh, & Liu, “cost-effective resource provisioning for MapReduce in a cloud,” IEEE Transactions on Parallel and Distributed Systems, pp: 1265-1279, 2015.
[10] Mike Frampton, Mastering Apache Spark (ed.) 2015, Packet publication ltd., U.K.
[11] Cloudera, Cloudera Platform 2018, [Online] http://cloudera.com/ (accessed on 15 January 2018)
[12] Hortonworks, Discussion about Horton Platform working,[Online] http://hortonworks.com/hdp/ (accessed on 15 June 2018)