Web Crawlers for Web Content Extraction
E. Suganya1 , Vijayarani 2
Section:Research Paper, Product Type: Journal Paper
Volume-7 ,
Issue-4 , Page no. 238-247, Apr-2019
CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i4.238247
Online published on Apr 30, 2019
Copyright © E. Suganya, Vijayarani . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: E. Suganya, Vijayarani, “Web Crawlers for Web Content Extraction,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.238-247, 2019.
MLA Style Citation: E. Suganya, Vijayarani "Web Crawlers for Web Content Extraction." International Journal of Computer Sciences and Engineering 7.4 (2019): 238-247.
APA Style Citation: E. Suganya, Vijayarani, (2019). Web Crawlers for Web Content Extraction. International Journal of Computer Sciences and Engineering, 7(4), 238-247.
BibTex Style Citation:
@article{Suganya_2019,
author = {E. Suganya, Vijayarani},
title = {Web Crawlers for Web Content Extraction},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2019},
volume = {7},
Issue = {4},
month = {4},
year = {2019},
issn = {2347-2693},
pages = {238-247},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4024},
doi = {https://doi.org/10.26438/ijcse/v7i4.238247}
publisher = {IJCSE, Indore, INDIA},
}
RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i4.238247}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4024
TI - Web Crawlers for Web Content Extraction
T2 - International Journal of Computer Sciences and Engineering
AU - E. Suganya, Vijayarani
PY - 2019
DA - 2019/04/30
PB - IJCSE, Indore, INDIA
SP - 238-247
IS - 4
VL - 7
SN - 2347-2693
ER -
![]() |
![]() |
![]() |
374 | 255 downloads | 174 downloads |
![](icone_social/Facebook.png)
![](icone_social/Twitter.png)
![](icone_social/Linkedin.png)
![](icone_social/Google+.png)
Abstract
The web crawler is an automated program, or script, that methodically scans or “crawls” through web pages to create an index of the data. It is mainly used to crawl the web pages and targets at fetching new or updated data from any websites and store the data for an easy access. Web crawler tools are getting well known to the common, since the web crawler has simplified and automated the entire crawling process to make web data resource become easily accessible to everyone. Using a web crawler tool will set people free from repetitive typing or copy-pasting, and could expect a well-structured and complete data collection. Moreover, these web crawler tools enable users to crawl the World Wide Web in a methodical and fast manner without coding and transform the data into various formats conforming to the user requirements. This research work aims at comparison of various available open source web crawlers which are intended to search and scrape the web data. Comparison between various open source crawlers like Visual SEO Studio, Screaming frog, Wild shark SEO Spider, ParseHub and HTTrack Website. The experimental analysis shows the best crawler based on the performance factors.
Key-Words / Index Term
Visual SEO Studio, Screaming Frog SEO Spider, Wild Shark, ParseHub, HTTrack
References
[1] Monika Yadav, Neha Goyal (2015), “Comparison of Open Source Crawlers- A Review”, International Journal of Scientific & Engineering Research, Volume 6, Issue 9.
[2] https://www.parsehub.com/intro
[3] http://www.kwasistudios.com/seo-spider-tools/
[4] https://www.screamingfrog.co.uk/seo-spider/
[5] https://www.crunchbase.com/organization/parsehub
[6] https://www.cabotsolutions.com/2016/11/a-detailed-overview-of-web-crawlers
[7] https://visual-seo.com/
[8] https://wildshark.co.uk/spider-tool/
[9] https://www.parsehub.com/
[10] https://www.httrack.com/