Open Access   Article Go Back

Query Execution Performance Analysis of Big Data Using Hive and Pig of Hadoop

Anshu Choudhary1 , C.S. Satsangi2

Section:Survey Paper, Product Type: Journal Paper
Volume-3 , Issue-9 , Page no. 91-97, Sep-2015

Online published on Oct 01, 2015

Copyright © Anshu Choudhary , C.S. Satsangi . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Anshu Choudhary , C.S. Satsangi, “Query Execution Performance Analysis of Big Data Using Hive and Pig of Hadoop,” International Journal of Computer Sciences and Engineering, Vol.3, Issue.9, pp.91-97, 2015.

MLA Style Citation: Anshu Choudhary , C.S. Satsangi "Query Execution Performance Analysis of Big Data Using Hive and Pig of Hadoop." International Journal of Computer Sciences and Engineering 3.9 (2015): 91-97.

APA Style Citation: Anshu Choudhary , C.S. Satsangi, (2015). Query Execution Performance Analysis of Big Data Using Hive and Pig of Hadoop. International Journal of Computer Sciences and Engineering, 3(9), 91-97.

BibTex Style Citation:
@article{Choudhary_2015,
author = {Anshu Choudhary , C.S. Satsangi},
title = {Query Execution Performance Analysis of Big Data Using Hive and Pig of Hadoop},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {9 2015},
volume = {3},
Issue = {9},
month = {9},
year = {2015},
issn = {2347-2693},
pages = {91-97},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=647},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=647
TI - Query Execution Performance Analysis of Big Data Using Hive and Pig of Hadoop
T2 - International Journal of Computer Sciences and Engineering
AU - Anshu Choudhary , C.S. Satsangi
PY - 2015
DA - 2015/10/01
PB - IJCSE, Indore, INDIA
SP - 91-97
IS - 9
VL - 3
SN - 2347-2693
ER -

VIEWS PDF XML
2513 2288 downloads 2323 downloads
  
  
           

Abstract

The cloud platform requires an efficient computational infrastructure. On this platform a huge amount of data gets generated in a fraction of a second, therefore, traditional computing techniques are not enough. The Big Data provides an answer for such huge computing and also provides support to scale the storage according to the application’s need. Big Data is a new generation storage infrastructure (hardware and software). In this paper the Big Data environment is investigated and the comparative study is performed among most frequently used data retrieval techniques. In order to perform the comparative study, Pig and Hive of Hadoop technology are selected. These techniques provide efficient data processing ability. In order to perform comparative study Hadoop storage is prepared first and then with the help of MapReduce framework the Pig and Hive are configured. Additionally, for evaluating the efficiency of query execution in terms of processing time, a list of similar queries is prepared and for each query the experiment was performed. The result evaluation is done for both the techniques. It is observed that query processing time of the Hive is less as compared to the Pig for the selected new_songs dataset, but both the data models are working to achieve the different goals thus both the technologies are adaptable for different kinds of computer configuration.

Key-Words / Index Term

Big Data; Hive; Pig; Performance Analysis; Data Processing; Query Execution Time

References

[1] Bharath Vissapragada, “Optimizing SQL Query Execution over Map-Reduce,” M.S. thesis, Dept Comp. Sc., Center for Data Engineering International Institute of Information Technology, Hyderabad, India, September 2014.
[2] Ammar Fuad, Alva Erwin, and Heru PurnomoIpung, “Processing Performance on Apache Pig, Apache Hive and MySQL Cluster,” International Conference on Information, Communication Technology and System, IEEE, 2014.
[3] F. Provost, T. Fawcett, “Data Science and its relationship to Big Data and data-driven decision making,” University of Massachusetts Amherst, DOI: 10.1089/big.2013.1508, March 2013.
[4] Changqing Ji, Yu Li, Wenming Qiu, Uchechukwu Awada, and Keqiu Li, “Big Data Processing in Cloud Computing Environments,” International Symposium on Pervasive Systems, Algorithms and Networks, IEEE, Dalian, China, 2012.
[5] Apache Hadoop, Available: http://wiki.apache.org/hadoop.
[6] Munesh Kataria, Ms.Pooja Mittal, “Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql,” IJCSMC, Vol. 3, July 2014, pp. 759 – 765.
[7] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu and Raghotham Murthy, “Hive – A Petabyte Scale Data Warehouse Using Hadoop,” ICDE Conference, IEEE, 2010.
[8] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff and Raghotham Murthy, “Hive – A Warehousing Solution Over a Map-Reduce Framework,” VLDB, ACM, Lyon, France, August 2009, pp. 24-28.
[9] Anja Gruenheid, Edward Omiecinski, and Leo Mark, “Query Optimization Using Column Statistics in Hive,” IDEAS, ACM, Lisbon, Portugal, September 2011, pp. 21-23.
[10] Meng-Ju Hsieh, Chao-Rui Chang, Li-Yung Ho, Jan-Jan Wu, and Pangfeng Liu, “SQLMR: A Scalable Database Management System for Cloud Computing,” DBLP, January 2011.
[11] Avrilia Floratou, Umar Farooq Minhas, and Fatma Ozcan, “SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures,” Proceedings of the VLDB Endowment, Vol. 7, No. 12, 2014.
[12] Rakesh Kumar, Neha Gupta, Shilpi Charu, Somya Bansal, and Kusum Yadav, “Comparison of SQL with HiveQL,” International Journal for Research in Technological Studies, Vol. 1, Issue 9, August 2014.
[13] Sai Prasad Potharaju, Shanmuk Srinivas, Ravi Kumar Tirandasu, “Case Study of Hive Using Hadoop,” DBLP, Volume-1, Issue-3, 2014.
[14] Madhuri Srinivas Palle, Konisa Jyothsna and B. Anusha, “Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using Pig on Hadoop,” International Journal of Computer Science and Engineering, Volume-2, Issue-1, 2014.
[15] Tak Lon Wu, Abhilash Koppula, and Judy Qiu, “Integrating Pig with Harp to Support Iterative Applications with Fast Cache and Customized Communication”, ACM, 2014.
[16] Gang Zhao, “A Query Processing Framework based on Hadoop,” International Journal of Database Theory and Application, Vol.7, No.4, 2014, pp. 261-272.
[17] James M. Harris, and Dr. Cynthia, and Z.F. Clark, “Strengthening Methodological Architecture with Multiple Frames and Data Sources,” Proceedings 59th ISI World Statistics Congress, Hong Kong, August 2013.
[18] J. Christy Jackson, V. Vijaya kumar, Md. Abdul Quadir, and C. Bharathi, “Survey on Programming Models and Environments for Cluster, Cloud, and Grid Computing that defends Big Data,” 2nd International Symposium on Big Data and Cloud Computing (ISBCC’15), ELSEVIER, 2015.
[19] Dataset that is used in this project, Available: https://github.com/jasondbaker/seis734.
[20] Radhiya A. Arsekar, Ankita V. Chikhale, Vaibhav T. Kamble and Vinayak N. Malavade, “Comparative Study of MapReduce and Pig in Big Data”, International Journal of Current Engineering and Technology, Vol.5, No.2, April 2015.