Open Access   Article Go Back

Heuristic Approach for Designing a Focused Web Crawler using Cuckoo Search

J. Dewanjee1

Section:Research Paper, Product Type: Journal Paper
Volume-4 , Issue-9 , Page no. 59-63, Sep-2016

Online published on Sep 30, 2016

Copyright © J. Dewanjee . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: J. Dewanjee, “Heuristic Approach for Designing a Focused Web Crawler using Cuckoo Search,” International Journal of Computer Sciences and Engineering, Vol.4, Issue.9, pp.59-63, 2016.

MLA Style Citation: J. Dewanjee "Heuristic Approach for Designing a Focused Web Crawler using Cuckoo Search." International Journal of Computer Sciences and Engineering 4.9 (2016): 59-63.

APA Style Citation: J. Dewanjee, (2016). Heuristic Approach for Designing a Focused Web Crawler using Cuckoo Search. International Journal of Computer Sciences and Engineering, 4(9), 59-63.

BibTex Style Citation:
@article{Dewanjee_2016,
author = {J. Dewanjee},
title = {Heuristic Approach for Designing a Focused Web Crawler using Cuckoo Search},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {9 2016},
volume = {4},
Issue = {9},
month = {9},
year = {2016},
issn = {2347-2693},
pages = {59-63},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1056},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1056
TI - Heuristic Approach for Designing a Focused Web Crawler using Cuckoo Search
T2 - International Journal of Computer Sciences and Engineering
AU - J. Dewanjee
PY - 2016
DA - 2016/09/30
PB - IJCSE, Indore, INDIA
SP - 59-63
IS - 9
VL - 4
SN - 2347-2693
ER -

VIEWS PDF XML
1894 1621 downloads 1485 downloads
  
  
           

Abstract

In order to find a geographical location in the Globe, we usually follow the geographical map. By a similar analogy, a Web-page from the World Wide Web (WWW), we usually use a Web search engine. Web crawler design is an important job to collect Web search engine resources from WWW. Millions of searches are done every minute around the Globe. A better Web search engine resource leads to achieve a better performance of the Web search engine. WWW is a huge resource of information. However this information is often spread throughout the internet via many Web servers and hosts. Every day people are publishing their Web pages in the Internet, as a result the traffic overhead increases exponentially. In order to produce a more accurate result, I have been motivated to follow a heuristic approach to design a Web crawler, which produces the best optimized search result in minimal time. This paper has built an approach to generate the best result in by Cuckoo Searching so that time will be least. I have divided my approach in two parts. First part is implementation of the crawler, which includes �what to search for�, �from where to search� and even filters the unwanted data. Second part proposed a string matching algorithm for producing the search result.

Key-Words / Index Term

Cuckoo search; DNS; meta-heuristic; optimization; pattern recognition; web crawling;

References

[1] Yang X., Deb S.: �Cuckoo Search via Levy Flights�. World Congress on Nature & Biologically Inspired Computing, 2009.
[2] Hu K., Wong W.S.: �A Probabilistic Model for Intelligent Web Crawlers�, 27th Annual International Computer Software and Applications Conference.
[3] Sun Y., Councill I. G., Giles C. L.: �The Ethicality of Web Crawlers�, IEEE: International Conference on Web Intelligence and Intelligent Agent Technology, 2010.
[4] Ntoulas A., Cho J, Olston C.: �What`s New on the Web? The Evolution of the Web from a Search Engine Perspective�, World-wide-Web Conference (WWW), May 2004.
[5] Arasu A., Cho J., Molina H. G., Paepcke A., Raghavan S.: �Searching The Web�, Computer Science Department, Stanford University.
[6] Cho J., Garcia-Molina H., Page L., �Efficient Crawling Through URL Ordering,� Technical Report, Computer Science Department, Stanford University, Stanford, CA, USA, 1997.
[7] Nath R., Bal S., �A Novel Mobile Crawler System Based on Filtering off Non-Modified Pages for Reducing Load on the Network,� Intenational Arab Journal of Information Technology, Vol. 8, Issue 3, pp.(272-279), 2011.
[8] Shkapenyuk V., Suel T., �Design and Implementation of A High Performance Distributed Web Crawler,� 18th International Conference on Data Engineering, San Jose, CA, IEEE CS Press, pp.(357-368), 2002.
[9] Boldi P., Codenotti B., Santini M., Vigna S., �Ubicrawler: A scalable fully distributed web crawler,� 8th Australian World Wide Web Conference, AUSWEB02, pp.(1-14), Australia, 2002.
[10] Edwards J., McCurley K. S., Tomlin J. A., �An adaptive model for optimizing performance of an incremental web crawler�, 10th Conference on World Wide Web, Elsevier Science, pp.(106-113), Hong Kong, 2001.
[11] Najork M., Wiener J. L., �Breadth-first crawling yields high-quality pages�, 10th Conference on World Wide Web, Elsevier Science, pp.(114-118), Hong Kong, 2001.
[12] Pinkerton B., �Finding what people want: Experiences with the WebCrawler�, 1st World Wide Web Conference, Geneva, Switzerland, 1994.
[13] Chakrabarti S., Berg M., Dom B. E., �Focused Crawling: a New Approach to Topic-specific Web Resource Discovery�, 8th International World Wide Web Conference, Elsevier, pp.(545-562), Toronto, Canada, 1999.
[14] Altingovde I. S., Ulusoy O., �Exploiting interclass rules for focused crawling�, IEEE Intelligent Systems, Vol. 19, Issue 6, pp.(66-73), DOI: 10.1109/MIS.2004.62, 2004.
[15] Zong X. J., Shen Y., Liao X. X., �Improvement of HITS for topic-specific web crawler�, Advances in Intelligent Computing, ICIC 2005, Part I, Lecture Notes in Compter Science, Vol. 3644, pp.(524-532), 2005.
[16] Shivlal Mewada, Sharma Pradeep, Gautam S.S., �Classification of Efficient Symmetric Key Cryptography Algorithms�, International Journal of Computer Science and Information Security (IJCSIS) USA, Vol. 14, No. 2, pp (105-110), Feb 2016
[17] Pant G., Srinivasan P., �Link contexts in classifier-guided topical crawlers�, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, Issue 1, pp.(107-122), 2006.
[18] Almpanidis G., Kotropoulos C., Pitas I., �Focused crawling using latent semantic indexing-An application for vertical search engines�, Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, Vol. 3652, pp.(402-413), 2005.
[19] Diligenti M., Coetzee F., Lawrence S., Giles C. L., Gori M., �Focused crawling using context graphs�, 26th International Conference on Very Large Databases, VLDB, Morgan Kaufmann, pp.(527-534), San Francisco, 2000.
[20] Bergmark D., Lagoze C., Sbityakov A., �Focused crawls, tunneling, and digital libraries,� European Conference on Digital Libraries, ECDL 2002. Lacture Notes in Computer Science, Roma, Italy, Vol. 2458, pp.(91-106), 2002.
[21] Blum C., Roli A.: Metaheuristics in combinatorial optimization: Overview and conceptural comparision, ACM Comput. Surv, 35, Page No. (268- 308), 2003.
[22] Yang X., �Nature-Inspired Metaheuristic Algorithms�. Feb, 2008.
[23] Cormen T. H., Leiserson C. E., Rivest R. L.,: Introduction to Algorithm, Prentice-Hall of India Private Limited, 7th ed, 2009.
[24] Abe U., Brandenburg. :String Matching., Page No (1�9), Sommersemester 2001.