Open Access   Article Go Back

Web Data Scraper Tools: Survey

S. Nain1 , B. Lall2

Section:Survey Paper, Product Type: Journal Paper
Volume-2 , Issue-5 , Page no. 39-44, May-2014

Online published on May 31, 2014

Copyright © S. Nain, B. Lall . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: S. Nain, B. Lall, “Web Data Scraper Tools: Survey,” International Journal of Computer Sciences and Engineering, Vol.2, Issue.5, pp.39-44, 2014.

MLA Style Citation: S. Nain, B. Lall "Web Data Scraper Tools: Survey." International Journal of Computer Sciences and Engineering 2.5 (2014): 39-44.

APA Style Citation: S. Nain, B. Lall, (2014). Web Data Scraper Tools: Survey. International Journal of Computer Sciences and Engineering, 2(5), 39-44.

BibTex Style Citation:
@article{Nain_2014,
author = {S. Nain, B. Lall},
title = {Web Data Scraper Tools: Survey},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2014},
volume = {2},
Issue = {5},
month = {5},
year = {2014},
issn = {2347-2693},
pages = {39-44},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=156},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=156
TI - Web Data Scraper Tools: Survey
T2 - International Journal of Computer Sciences and Engineering
AU - S. Nain, B. Lall
PY - 2014
DA - 2014/05/31
PB - IJCSE, Indore, INDIA
SP - 39-44
IS - 5
VL - 2
SN - 2347-2693
ER -

VIEWS PDF XML
4036 3640 downloads 3656 downloads
  
  
           

Abstract

World Wide Web contains a huge amount of information that is increasing rapidly. Usually data stored on the web are in unstructured and semi-structured form. In order to obtain the essential data from the web, certain data scraper tools had been invented. In this paper we intend to briefly survey Web Data Scraper Process, the taxonomy for characterizing Web Data Scraper Tools and provide qualitative analysis of them. Hopefully, this work will simulate other studies aimed at a more comprehensive analysis of data scraper approaches and tools for Web data.

Key-Words / Index Term

Wrapper; Scraper;Document Object Model(DOM)

References

[1] Searchsoa website : www.searchsoa.techtarget.com
[2] Adelberg, B.Nodose: A Tool for semi-Automatically extracting structured and semi-structured data from text documents.In proceeding of ACM SIGMOD International conference on management of data (Seattle, WA, 1998) pp. 283-294.
[3] Arocena, G.O., Mendelzon, A.O.WebOQL: Restructuring Documents, Databases and Web. In proceedings of the 14th IEEE international conference on data engineering (Orlando, Florida, 1998) pp. 24-33.
[4] Califf, M.E., Mooney, and R.J.: Relational learning of pattern-match rules for information extraction. In proceeding of 16th national conference on artificial intelligence and 11th conference on innovative applications of artificial intelligence (Orlando, Florida, 1999) pp. 328-334.
[5] Crescenzi, V., Mecca, G.: Grammer have exceptions. Information Systems 23, 8 (1998), 539-565.
[6] Baumgartner, R., Gatterbauer, W., Gottlob, G. 2009: Web data extraction system. Encyclopedia of database systems, 3465-3471.
[7] Valter, G. Mecca, Paolo 2001: Road Runner Toward Automatic Generation from Large Web Sites
[8] Noha Negm, Passent, Abdel. B. Salem 2012:A survey of Web Information Extraction Tools
[9] Alberto, Berthier, Altigran, Julianan S.Teixeira : A brief survey of Web Data Extraction Tools
[10] Emilio Ferrara, Giacomo F., Robert Baumgartner: Web Data Extraction, Applications and Techniques: A survey. In ACM Transcations on Computational Logic June 2010.
[11] Baumgartner, R., Flesca, S., and Gottlob, G. Visual Web information extraction with Lixto. In Proceedings of the 26th International Conference on Very Large Database Systems (Rom, Italy, 2001), pp.119-128.
[12] Buneman, P. Semistructured data. In Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Tucson, Arizona, 1997), pp.117-121.
[13] Califf, M. E., and Mooney, R. J. Relational Learning of Pattern-Match Rules for Information Extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence (Orlando, Florida, 1999), pp.328-334.
[14] Crescenzi, V., and Mecca, G. Grammars Have Exceptions. Information Systems 23,8 (1998), 539-565.
[15] Crescenzi, V., Mecca, G., and Merialdo, P. RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In Proceedings of the 26th International Conference on very large Database Systems (Rome, Italy, 2001).
[16] Embley, D. W., Campbell, D. M., Jiang, Y. S., Liddle, S. W., Kai Ng, Y., Quass, D., and Smith, R. D. Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages. Data and Knowledge Engineering 31, 3 (1999), 227-251.