Open Access   Article Go Back

Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River

D. A. Lingote1 , Girish S. Katkar2

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-3 , Page no. 550-556, Mar-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i3.550556

Online published on Mar 31, 2019

Copyright © D. A. Lingote, Girish S. Katkar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: D. A. Lingote, Girish S. Katkar, “Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.3, pp.550-556, 2019.

MLA Style Citation: D. A. Lingote, Girish S. Katkar "Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River." International Journal of Computer Sciences and Engineering 7.3 (2019): 550-556.

APA Style Citation: D. A. Lingote, Girish S. Katkar, (2019). Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River. International Journal of Computer Sciences and Engineering, 7(3), 550-556.

BibTex Style Citation:
@article{Lingote_2019,
author = {D. A. Lingote, Girish S. Katkar},
title = {Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2019},
volume = {7},
Issue = {3},
month = {3},
year = {2019},
issn = {2347-2693},
pages = {550-556},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3878},
doi = {https://doi.org/10.26438/ijcse/v7i3.550556}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i3.550556}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3878
TI - Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River
T2 - International Journal of Computer Sciences and Engineering
AU - D. A. Lingote, Girish S. Katkar
PY - 2019
DA - 2019/03/31
PB - IJCSE, Indore, INDIA
SP - 550-556
IS - 3
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
530 486 downloads 175 downloads
  
  
           

Abstract

Now a day’s internet is very popular and widely used for information generation and broadcasting. If current trend is observed, then most of the organization/labs/institute uses “PDF” (Portable Document Format) document to release their official/research report. PDF document has many benefits, hence popularly used for publishing information on the web. if this widely published information extracted and re-processed then this information can be useful inputs for many research and development projects. In this research paper we introduced information extraction algorithm, which extracts information from the pdf document using free libraries. To be specific, we have targeted PDF documents comprising Kanhan River water quality data, which is freely published over the internet. To present this information beautifully, extracted information is geo-mapped and re-published in the public domain which helps in observing and validating Kanhan River water quality data at different geographical locations.

Key-Words / Index Term

PDF Extraction, data generation, Extraction, Kanhan River, information system

References

[1] Dr. G. K. Khadse, P. M. Patni, P.S. Kelkar, S. Devotta, "Qualitative evaluation of Kanhan River and its tributaries flowing over central Indian plateau", Environ Monit Assess. 2008 Dec; 147 (1-3):83-92. Epub 2007 Dec 22.
[2] Margaret H. Dunham, “Data Mining Introductory & Advanced Topics”, Pearson Education
[3] Dinesh A. Lingote1*, Girish S. Katkar2, Ritesh Vijay 3, R. B. Biniwale4, "Responsive Information generation system for Kanhan River, an effective information system for river modeling", International Journal of Computer Science and Engineering (IJCSE, E-ISSN: 2347-2693), Vol.-6, Issue-12, Dec 2018
[4] Library org.apache.pdfbox.* is attributed as it is used for reading PDF document.
[5] Mehrdad Jalali, Norwati Mustapha et al,” A Recommender System Approach for Classifying User Navigation Patterns Using Longest Common Subsequence Algorithm”, American Journal of Scientific Research ISSN 1450-223X Issue 4 (2009), pp 17-27
[6] K. A. Smith and A. Ng, Web page clustering using a self-organizing map of user navigation patterns, Decision Support Syst. 35(2) (2003) 245–256
[7] Nacim Fateh Chikhi, Bernard Rothenburger, Nathalie Aussenac-Gilles “A Comparison of Dimensionality Reduction Techniques for Web Structure Mining”, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, P.116-119 ,2007
[8] Poonam Devi, "Attacks on Cloud Data: A Big Security Issue", International Journal of Scientific Research in Network Security and Communication, Volume-6, Issue-2, April 2018
[9] P.V. Nikam, D.S. Deshpande, "Different Approaches for Frequent Itemset Mining", International Journal of Scientific Research in computer science and Engineering, Vol.6, Issue.2, pp. 10-14, April (2018)