Open Access   Article Go Back

Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using PIG on Hadoop

M.S. Palle1 , K. Jyothsna2 , B. Anusha3

Section:Research Paper, Product Type: Journal Paper
Volume-2 , Issue-1 , Page no. 1-5, Jan-2014

Online published on Feb 04, 2014

Copyright © M.S. Palle, K. Jyothsna, B. Anusha . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: M.S. Palle, K. Jyothsna, B. Anusha, “Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using PIG on Hadoop,” International Journal of Computer Sciences and Engineering, Vol.2, Issue.1, pp.1-5, 2014.

MLA Style Citation: M.S. Palle, K. Jyothsna, B. Anusha "Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using PIG on Hadoop." International Journal of Computer Sciences and Engineering 2.1 (2014): 1-5.

APA Style Citation: M.S. Palle, K. Jyothsna, B. Anusha, (2014). Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using PIG on Hadoop. International Journal of Computer Sciences and Engineering, 2(1), 1-5.

BibTex Style Citation:
@article{Palle_2014,
author = {M.S. Palle, K. Jyothsna, B. Anusha},
title = {Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using PIG on Hadoop},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {1 2014},
volume = {2},
Issue = {1},
month = {1},
year = {2014},
issn = {2347-2693},
pages = {1-5},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=30},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=30
TI - Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using PIG on Hadoop
T2 - International Journal of Computer Sciences and Engineering
AU - M.S. Palle, K. Jyothsna, B. Anusha
PY - 2014
DA - 2014/02/04
PB - IJCSE, Indore, INDIA
SP - 1-5
IS - 1
VL - 2
SN - 2347-2693
ER -

VIEWS PDF XML
4234 3933 downloads 3852 downloads
  
  
           

Abstract

Data sets used to fuel the recently popular concept of �business intelligence� are becoming increasingly large. Conventional database management software is no longer efficient enough however; parallel database management systems and massive data-scale processing systems like MapReduce indeed look promising. Although, MapReduce is a good option, it is difficult to work with, as the programmer would have to think at the mapper and reducer level. In this paper, we present a simple yet efficient way to mine useful information where a program can be written as a series of steps. We have queried a supercomputer log file using Apache�s Hadoop and PIG, obtained results as to when and why the supercomputer had failed and compared these results to that of a traditional program.

Key-Words / Index Term

Big Data, Parallel Processing, Hadoop, MapReduce, Data Mining, Business Intelligence, PIG, Log file analysis, Supercomputer

References

[1]. T. White, Hadoop: The Definitive Guide. Yahoo Press,2010.
[2]. Chuck Lam,Pig:Hadoop in Action.
[3]. J. Dean and S. Ghemawat, �Mapreduce: Simplified Data Processing on Large Clusters,� Comm. of the ACM,Vol. 51, no. 1, pp. 107�113, 2008.
[4]. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, �Pig latin: A Not-So-Foreign Language for Data Processing,� Proc. of the 2008 ACM SIGMOD international conferenceon Management of Data, 2008, pp. 1099�1110.
[5]. Thomas Reidemeister, Mohammad Ahmad Munawar, Miao Jiang, Paul A.S.Ward, "Diagnosis of Recurrent Faults using Log Files," Proc. of the 2009 Conference of the Center for Advanced Studies on Collaborative Research,November 2009, pp. 12-23 .
[6]. Apache. Hadoop: Open-source implementation of MapReduce. http://hadoop.apache.org.
[7]. Apache. Pig: High-level data ow system for Hadoop. http://www.pig.apache.org
[8]. Michael Cardosa, Chenyu Wang, Anshuman Nangia, Abhishek Chandra, Jon Weissman,"Exploring MapReduce efficiency with highly-distributed data" Proc. of the second international workshop on MapReduce and its applications",June 2011, pp. 27-34.
[9]. H.-C. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker, �Map-reducemerge: simplified relational data processing on large clusters,�proc. of the SIGMOD Conference, 2007, pp. 1029�1040.
[10]. A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava., �Building a High-Level Dataflow System on Top of Map-Reduce: The Pig Experience.� Proc. of the VLDB Endowment, vol. 2,no. 2, 2009.
[11]. A tutorial on pig: http://www.pig-tutorial.blogspot.in/