Open Access   Article Go Back

A Comparative Study of Three IR models for Bengali Document Retrieval

Soma Chatterjee1 , Kamal Sarkar2

Section:Research Paper, Product Type: Journal Paper
Volume-07 , Issue-01 , Page no. 220-225, Jan-2019

Online published on Jan 20, 2019

Copyright © Soma Chatterjee, Kamal Sarkar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Soma Chatterjee, Kamal Sarkar, “A Comparative Study of Three IR models for Bengali Document Retrieval,” International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.220-225, 2019.

MLA Style Citation: Soma Chatterjee, Kamal Sarkar "A Comparative Study of Three IR models for Bengali Document Retrieval." International Journal of Computer Sciences and Engineering 07.01 (2019): 220-225.

APA Style Citation: Soma Chatterjee, Kamal Sarkar, (2019). A Comparative Study of Three IR models for Bengali Document Retrieval. International Journal of Computer Sciences and Engineering, 07(01), 220-225.

BibTex Style Citation:
@article{Chatterjee_2019,
author = {Soma Chatterjee, Kamal Sarkar},
title = {A Comparative Study of Three IR models for Bengali Document Retrieval},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {1 2019},
volume = {07},
Issue = {01},
month = {1},
year = {2019},
issn = {2347-2693},
pages = {220-225},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=622},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=622
TI - A Comparative Study of Three IR models for Bengali Document Retrieval
T2 - International Journal of Computer Sciences and Engineering
AU - Soma Chatterjee, Kamal Sarkar
PY - 2019
DA - 2019/01/20
PB - IJCSE, Indore, INDIA
SP - 220-225
IS - 01
VL - 07
SN - 2347-2693
ER -

           

Abstract

In this paper, we studied and examined some selected information retrieval approaches for Bengali information retrieval. These approaches used keyword to describe the content of each document. We choose three models to understand their working mechanisms and shortcomings. These models are TFIDF Vector Space model, Latent Semantic Indexing (LSI) model, and BM25 model. This understanding is important to overcome these shortcomings. These models are examined on our created Bengali dataset and Bengali queries and the results are stated in the result section in this paper. Our study reveals that Okapi BM25 model performs best among the three IR models studied for Bengali document retrieval.

Key-Words / Index Term

Information Retrieval, Bengali language, LSI, BM25, probabilistic, Query

References

[1] R. Banerjee, & S. Pal, “ISM @ FIRE - 2011: Monolingual Task”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2011). Available at http://www.isical.ac.in/~fire/2011/workingnotes. html (visited May 2015),2011.
[2] U. Barman, P. Lohar, P. Bhaskar, & S. Bandyopadhyay, “ Ad-hoc Information Retrieval focused on Wikipedia based Query Expansion and Entropy Based Ranking” ,Working Notes of the Forum for Information Retrieval Evaluation, Available at http://www.isical.ac.in/~fire/2012/working-notes.html, 2012.
[3] P. Bhaskar, Das, A. Pakra & S. Bandyopadhyay , “Theme Based English and Bengali Ad-hoc Monolingual Information Retrieval in FIRE 2010”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2010), Available at http://www.isical.ac.in/~fire/2010/working_notes.html (visited May 2015), 2010.
[4] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, & R. Harshman, “Indexing by latent semantic analysis”, Journal of the American society for information science, Vol. 41, No. (6), 391. 1990.
[5] L. Dolamic & J. Savoy, “UniNE at FIRE 2008: Hindi, Bengali, and Marathi IR” , In: Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2008). Available at http://www.isical.ac.in/~fire/2008/working_notes.html (visited May 2015) ,2008.
[6] D. Ganguly, J. Leveling, & G. J. F. Jones, “A Case Study in Decompounding for Bengali Information Retrieval. Information Access Evaluation, Multilinguality, Multimodality, and Visualization, Lecture Notes in Computer Science, Vol. 8138, pp. 108-119,2013.
[7] M. Kantrowitz, B. Mohit, & V. Mittal ,“Stemming and Its Effects on TFIDF Ranking” In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece ,pages 357–359, 2000.
[8] W. Kraaij & R. Pohlmann, “Viewing stemming as recall enhancement” In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM ,pp. 40-48,1996.
[9] P. J. Loponen, , & K. Jarvelin, “UTA Stemming and Lemmatization Experiments in the Bengali ad hoc Track at FIRE 2010”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2010). Available at http://www.isical.ac.in/~fire/2010/working_notes.html (visited May 2015), 2010.
[10] P. Majumdar, M. Mitra,S.K. Parui & G. Kole, “YASS: Yet Another Suffix Stripper”, ACM Transactions on Information Systems, Vol. 25 , No.4, Article 18,2007.
[11] R. Marcus, “Computer and Human Understanding in Intelligent Retrieval Assistance”, American Society for Information Science, 28, 1998.
[12] P. McNamee, “N-gram Tokenization for Indian Language Text Retrieval”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2008), Available at http://www.isical.ac.in/~fire/2008/working_notes.html (visited May 2015), 2008.
[13] J. H. Paik & S. K. Parui, “A Simple Stemmer for Inflectional Language” , In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2008), Available at http://www.isical.ac.in/~fire/2008/working_notes.html (visited May 2015), 2008.
[14] S. .E. Robertson, “The probability ranking principle in IR”, Journal of Documentation, 33, 294-304, 1977.
[15] G. Salton, A. Wong & C. S. Yang, “A vector space model for automatic indexing” , Communications of the ACM, Vol.18, No.11, PP.613-620, 1975.
[16] K. Sarkar & A. Gupta, “An Empirical Study of Some Selected IR Models for Bengali Monolingual Information Retrieval”, In Proceedings of ICBIM, NIT, Durgapur, 2016.
[17] K. Jones Spärck , S. Walker & S. E. Robertson, “A probabilistic model of information retrieval Development and comparative experiments”, IP&M, Vol. 36, No. 6, pp.779–808, 809–840.
[18] H. Turtle & W. Bruce Croft, “Inference networks for document retrieval”, InProc. SIGIR, pp. 1–24, 1989
[19] H. Turtle & W. Bruce Croft, “Evaluation of an inference network-based Retrieval model”, TOIS ,Vol.9, No. 3, pp.187–222, 1991.
[20] C. J. Van Rijsbergen, “Information Retrieval”, 2nd edition, Butterworths, LONDON, 1979.
[21] A. Singhal and F. Pereira, “Document expansion for speech retrieval” , In procedding of ACM SIGIR, Berkeley, CA, USA, pages 223-232,1999.
[22] M. Berry, S. Dumais and G. W. O’Brien, “Using linear algebra for intelligent information retrieval, SIAM Review, pp.573-595, 1995.
[23] D.R Radev, H. Jing, M. Sty´s, and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management,Vol. 40, No. 6,pp.919–938, 2004.
[24] S. Chatterjee & K. Sarkar, Combining “IR Models for Bengali Information Retrieval”, International Journal of Information Retrieval Research (IJIRR), vol.8 issue 3 article 5, pp.68-83, 2017.