Open Access   Article

Comparison of Generative and Discriminative Models of Part of Speech Taggers for Marathi Language

Rushali Dhumal Deshmukh1

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-10 , Page no. 16-21, Oct-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i10.1621

Online published on Oct 31, 2018

Copyright © Rushali Dhumal Deshmukh . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

Citation

IEEE Style Citation: Rushali Dhumal Deshmukh, “Comparison of Generative and Discriminative Models of Part of Speech Taggers for Marathi Language”, International Journal of Computer Sciences and Engineering, Vol.6, Issue.10, pp.16-21, 2018.

MLA Style Citation: Rushali Dhumal Deshmukh "Comparison of Generative and Discriminative Models of Part of Speech Taggers for Marathi Language." International Journal of Computer Sciences and Engineering 6.10 (2018): 16-21.

APA Style Citation: Rushali Dhumal Deshmukh, (2018). Comparison of Generative and Discriminative Models of Part of Speech Taggers for Marathi Language. International Journal of Computer Sciences and Engineering, 6(10), 16-21.

VIEWS PDF XML
21 42 downloads 1 downloads
  
  
           

Abstract

Part of Speech (POS) tagging is the process of assigning grammatical category to words. POS tagger has wide variety of applications in the field of natural language processing, speech processing, information retrieval, machine translation, sentiment analysis, question answering etc. For Indian languages, the research in the field of POS tagging is still in progress. Marathi is the fourth spoken language in India and morphologically rich language. In this paper, we compared performance of Marathi POS tagger using generative and discriminative models. Using 32 tags, specified by Unified POS standard for Marathi, POS tagged dataset of 1500 news sentences, from different domains like sports, politics, entertainment etc., is generated. The Naive Bayes, Decision Tree, Neural Network, K Nearest Neighbour, Hidden Markov Model and Conditional Random Fields give 81%, 79%, 85%, 78%, 79% and 86% accuracy on test data respectively. Results show that neural network and Conditional Random Fields give better performance.

Key-Words / Index Term

Part of speech tagging, Generative models, Discriminative models, Naive Bayes, Decision tree, Neural network, Hidden markov model, Conditional Random Fields

References

[1] Vadivukarassi, M., N. Puviarasan, and P. Aruna. "Identification of Opinion Words and Polarity of Reviews in Tweets using Aspect Based Opinion Mining." International Journal of Scientific Research in Computer Science, Engineering and Information Technology pp.282-289, 2017.
[2] Vidya, S. "Cross Domain Sentiment Classification Using Natural Language Processing." IJSRCSEIT pp.348-353,2018.
[3] Bollegala, Danushka, David Weir, and John Carroll. "Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification." In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 132-141. Association for Computational Linguistics, 2011.
[4] Singh, Jyoti, Nisheeth Joshi, and Iti Mathur. "Part of speech tagging of Marathi text using trigram method." arXiv preprint arXiv:1307.4299,2013.
[5] Singh, Jyoti, Nisheeth Joshi, and Iti Mathur. "Marathi Parts-of-Speech Tagger Using Supervised Learning." Intelligent Computing, Networking, and Informatics. Springer, New Delhi, pp.251-257,2014.
[6] Patil, H. B., A. S. Patil, and B. V. Pawar. "Part-of-Speech Tagger for Marathi Language using Limited Training Corpora." IJCA Proceedings on National Conference on Recent Advances in Information Technology NCRAIT (4). pp.33-37, 2014.
[7] Singh, Jyoti, Nisheeth Joshi, and Iti Mathur. "Development of Marathi part of speech tagger using statistical approach." In Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on, pp. 1554-1559. IEEE, 2013.
[8] Das, Bishwa Ranjan, Smrutirekha Sahoo, Chandra Sekhar Panda, and Srikanta Patnaik. "Part of speech tagging in odia using support vector machine." Procedia Computer Science 48 .pp. 507-512,2015.
[9] Brill, Eric. "A simple rule-based part of speech tagger." In Proceedings of the third conference on Applied natural language processing, pp. 152-155. Association for Computational Linguistics, 1992.
[10] Bach, Ngo Xuan, Nguyen Dieu Linh, and Tu Minh Phuong. "An empirical study on POS tagging for Vietnamese social media text." Computer Speech & Language 50. pp. 1-15,2018.
[11] Carneiro, Hugo CC, Felipe MG França, and Priscila MV Lima. "Multilingual part-of-speech tagging with weightless neural networks." Neural Networks 66 pp.11-21,2015.
[12] Narayan, Ravi, S. Chakraverty, and V. P. Singh. "Neural network based parts of speech tagger for Hindi." IFAC Proceedings Volumes 47.1.pp.519-524,2014.
[13] Okhovvat, Morteza, and Behrouz Minaei Bidgoli. "A hidden Markov model for Persian part-of-speech tagging." Procedia Computer Science 3.pp. 977-981,2011.
[14] Alex, Marylyn, and Lailatul Qadri Zakaria. "Kadazan Part of Speech Tagging Using Transformation-based Approach." Procedia Technology 11.pp. 621-627,2013.
[15] Joshi, Nisheeth, Hemant Darbari, and Iti Mathur. "HMM based POS tagger for Hindi." Proceeding of 2013 International Conference on Artificial Intelligence, Soft Computing (AISC-2013). 2013.
[16] Garg, Navneet, Vishal Goyal, and Suman Preet. "Rule based Hindi part of speech tagger." Proceedings of COLING 2012: Demonstration Papers.pp.163-174,2012.
[17] Bharati, Akshar, et al. "Anncorra: Annotating corpora guidelines for pos and chunk annotation for indian languages." LTRC-TR31 (2006).