Web Crawlers for Web Content Extraction

E. Suganya, Vijayarani

Open Access Article Go Back

Web Crawlers for Web Content Extraction

E. Suganya¹ , Vijayarani ²

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-4 , Page no. 238-247, Apr-2019

CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i4.238247

Online published on Apr 30, 2019

Copyright © E. Suganya, Vijayarani . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: E. Suganya, Vijayarani, “Web Crawlers for Web Content Extraction,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.238-247, 2019.

MLA Style Citation: E. Suganya, Vijayarani "Web Crawlers for Web Content Extraction." International Journal of Computer Sciences and Engineering 7.4 (2019): 238-247.

APA Style Citation: E. Suganya, Vijayarani, (2019). Web Crawlers for Web Content Extraction. International Journal of Computer Sciences and Engineering, 7(4), 238-247.

BibTex Style Citation:
@article{Suganya_2019,
author = {E. Suganya, Vijayarani},
title = {Web Crawlers for Web Content Extraction},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2019},
volume = {7},
Issue = {4},
month = {4},
year = {2019},
issn = {2347-2693},
pages = {238-247},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4024},
doi = {https://doi.org/10.26438/ijcse/v7i4.238247}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i4.238247}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4024
TI - Web Crawlers for Web Content Extraction
T2 - International Journal of Computer Sciences and Engineering
AU - E. Suganya, Vijayarani
PY - 2019
DA - 2019/04/30
PB - IJCSE, Indore, INDIA
SP - 238-247
IS - 4
VL - 7
SN - 2347-2693
ER -

VIEWS	PDF	XML
374	255 downloads	174 downloads

Bar Line

Abstract

The web crawler is an automated program, or script, that methodically scans or “crawls” through web pages to create an index of the data. It is mainly used to crawl the web pages and targets at fetching new or updated data from any websites and store the data for an easy access. Web crawler tools are getting well known to the common, since the web crawler has simplified and automated the entire crawling process to make web data resource become easily accessible to everyone. Using a web crawler tool will set people free from repetitive typing or copy-pasting, and could expect a well-structured and complete data collection. Moreover, these web crawler tools enable users to crawl the World Wide Web in a methodical and fast manner without coding and transform the data into various formats conforming to the user requirements. This research work aims at comparison of various available open source web crawlers which are intended to search and scrape the web data. Comparison between various open source crawlers like Visual SEO Studio, Screaming frog, Wild shark SEO Spider, ParseHub and HTTrack Website. The experimental analysis shows the best crawler based on the performance factors.

Key-Words / Index Term

Visual SEO Studio, Screaming Frog SEO Spider, Wild Shark, ParseHub, HTTrack

References

[1] Monika Yadav, Neha Goyal (2015), “Comparison of Open Source Crawlers- A Review”, International Journal of Scientific & Engineering Research, Volume 6, Issue 9.
[2] https://www.parsehub.com/intro
[3] http://www.kwasistudios.com/seo-spider-tools/
[4] https://www.screamingfrog.co.uk/seo-spider/
[5] https://www.crunchbase.com/organization/parsehub
[6] https://www.cabotsolutions.com/2016/11/a-detailed-overview-of-web-crawlers
[7] https://visual-seo.com/
[8] https://wildshark.co.uk/spider-tool/
[9] https://www.parsehub.com/
[10] https://www.httrack.com/

Citations	2325
h-index	16
i10-index	47