Open Access   Article Go Back

Web Crawlers for Web Content Extraction

E. Suganya1 , Vijayarani 2

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-4 , Page no. 238-247, Apr-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i4.238247

Online published on Apr 30, 2019

Copyright © E. Suganya, Vijayarani . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: E. Suganya, Vijayarani, “Web Crawlers for Web Content Extraction,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.238-247, 2019.

MLA Style Citation: E. Suganya, Vijayarani "Web Crawlers for Web Content Extraction." International Journal of Computer Sciences and Engineering 7.4 (2019): 238-247.

APA Style Citation: E. Suganya, Vijayarani, (2019). Web Crawlers for Web Content Extraction. International Journal of Computer Sciences and Engineering, 7(4), 238-247.

BibTex Style Citation:
@article{Suganya_2019,
author = {E. Suganya, Vijayarani},
title = {Web Crawlers for Web Content Extraction},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2019},
volume = {7},
Issue = {4},
month = {4},
year = {2019},
issn = {2347-2693},
pages = {238-247},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4024},
doi = {https://doi.org/10.26438/ijcse/v7i4.238247}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i4.238247}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4024
TI - Web Crawlers for Web Content Extraction
T2 - International Journal of Computer Sciences and Engineering
AU - E. Suganya, Vijayarani
PY - 2019
DA - 2019/04/30
PB - IJCSE, Indore, INDIA
SP - 238-247
IS - 4
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
374 255 downloads 174 downloads
  
  
           

Abstract

The web crawler is an automated program, or script, that methodically scans or “crawls” through web pages to create an index of the data. It is mainly used to crawl the web pages and targets at fetching new or updated data from any websites and store the data for an easy access. Web crawler tools are getting well known to the common, since the web crawler has simplified and automated the entire crawling process to make web data resource become easily accessible to everyone. Using a web crawler tool will set people free from repetitive typing or copy-pasting, and could expect a well-structured and complete data collection. Moreover, these web crawler tools enable users to crawl the World Wide Web in a methodical and fast manner without coding and transform the data into various formats conforming to the user requirements. This research work aims at comparison of various available open source web crawlers which are intended to search and scrape the web data. Comparison between various open source crawlers like Visual SEO Studio, Screaming frog, Wild shark SEO Spider, ParseHub and HTTrack Website. The experimental analysis shows the best crawler based on the performance factors.

Key-Words / Index Term

Visual SEO Studio, Screaming Frog SEO Spider, Wild Shark, ParseHub, HTTrack

References

[1] Monika Yadav, Neha Goyal (2015), “Comparison of Open Source Crawlers- A Review”, International Journal of Scientific & Engineering Research, Volume 6, Issue 9.
[2] https://www.parsehub.com/intro
[3] http://www.kwasistudios.com/seo-spider-tools/
[4] https://www.screamingfrog.co.uk/seo-spider/
[5] https://www.crunchbase.com/organization/parsehub
[6] https://www.cabotsolutions.com/2016/11/a-detailed-overview-of-web-crawlers
[7] https://visual-seo.com/
[8] https://wildshark.co.uk/spider-tool/
[9] https://www.parsehub.com/
[10] https://www.httrack.com/