Open Access   Article Go Back

Factored Language Modeling

A.R. Babhulgaonkar1 , S.P. Sonavane2

Section:Research Paper, Product Type: Journal Paper
Volume-06 , Issue-01 , Page no. 19-25, Feb-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6si1.1925

Online published on Feb 28, 2018

Copyright © A.R. Babhulgaonkar, S.P. Sonavane . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: A.R. Babhulgaonkar, S.P. Sonavane, “Factored Language Modeling,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.19-25, 2018.

MLA Style Citation: A.R. Babhulgaonkar, S.P. Sonavane "Factored Language Modeling." International Journal of Computer Sciences and Engineering 06.01 (2018): 19-25.

APA Style Citation: A.R. Babhulgaonkar, S.P. Sonavane, (2018). Factored Language Modeling. International Journal of Computer Sciences and Engineering, 06(01), 19-25.

BibTex Style Citation:
@article{Babhulgaonkar_2018,
author = {A.R. Babhulgaonkar, S.P. Sonavane},
title = {Factored Language Modeling},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {2 2018},
volume = {06},
Issue = {01},
month = {2},
year = {2018},
issn = {2347-2693},
pages = {19-25},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=185},
doi = {https://doi.org/10.26438/ijcse/v6i1.1925}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i1.1925}
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=185
TI - Factored Language Modeling
T2 - International Journal of Computer Sciences and Engineering
AU - A.R. Babhulgaonkar, S.P. Sonavane
PY - 2018
DA - 2018/02/28
PB - IJCSE, Indore, INDIA
SP - 19-25
IS - 01
VL - 06
SN - 2347-2693
ER -

           

Abstract

Language modeling is a technique for finding the next most probable word in a sentence. It is first and essential task for successful implementation of some natural language processing applications like machine translation and speech recognition. It ensures for correctness and fluency of the target output in these applications. N-gram is a traditional way to implement language model in which only previous words in the sentence are used to predict the probable next word in the sentence. Factored language modeling is a method to utilize linguistic knowledge of the word along with the word itself for constructing the language model. The paper describes the factored language modeling technique and compares the results obtained against the traditional n-gram technique using perplexity as a measure.

Key-Words / Index Term

Language model, Perplexity, Factored language model, Backoff.

References

[1] R. Rosenfeld, “Two decades of statistical language modeling: where do we go from here?”, In the Proceedings of the 2000 IEEE Intenational conferance, Vol. 88, Issue. 8 pp. 1270–1278, 2000.
[2] S. F. Chen, J. Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling” , In the Proceedings of the 1996 Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, pp 310-318, 1996.
[3] J.A. Bilmes, K. Kirchhoff, “Factored Language Models and Generalized Parallel Backoff ”, In the Proceedings of the 2003 HLT/NAACL, pp 4-6, 2003.
[4] K. Kirchhoff, J. Bilmes, K. Duh, “Factored Language Models Tutorial”, University of Washington, 2016.
[5] A. E. Axelrod, “Factored Language Models for Statistical Machine Translation ”, University of Edinburgh, 2006.
[6] A. Stolcke, “SRILM- an Extensible Language Modeling Toolkit”, In the Proceedings of the 2002 International Conference on Spoken Language Processing, Denver, Colorado, September 2002.
[7] A. Stolcke, J. Wheng, W. Wang, V. Abrash, “SRILM at Sixteen: Update and Outlook”, In the Proceedings of the 2011 IEEE Automatic Speech Recognition and Understanding Workshop, Waikoloa, 2011.
[8] K. Duh, K. Kirchhoff, “Automatic Learning of Language Model Structure”, In the Proceedings of the 2004 International Conference on Computational Linguistics (COLING), 2004.
[9] E. M. deNovais, “Portuguese Text Generation Using Factored Language Models”, J. Brazilian Computation Society, Vol. 19, Issue. 2, pp 135–146, 2013.
[10] M. Laz ̆ar, D. Militaru, “A Romanian Language Modeling Using Linguistic Factors” , In the Proceedings of the 2013 7th Conference in Speech Technology and Human - Computer Dialogue (SpeD), Cluj-Napoca, , pp. 1–6, 2013.
[11] I. Kipyatkova, A. Karpov, “Study of Morphological Factors of Factored Language Models for Russian ASR”, In the Proceedings of the 2014 SPECOM 2014, Novi Sad, pp. 451–458, 2014.
[12] H. Sak, M. Saraçlar, T. Güngör, “Morphology Based and Sub Word Language Modeling for Turkish Speech Recognition”, In the Proceedings of the 2010 ICASSP, Dallas, pp. 5402–5405, 2010.
[13] A. Mousa, M. Shaik, R. Schlüter, H. Ney, “Morpheme Based Factored Language Models for German LVCSR”, In the Proceedings of the 2011 INTERSPEECH, Florence, pp. 1053–1056, 2011.
[14] Z. Alumae, “Sentence Adapted Factored Language Model for Transcribing Stonian Speech”, In the Proceedings of the 2006 ICASSP, Toulouse, pp. 429–432, 2006.
[15] T. Hirsimaki, J. Pylkkonen, M. Kurimo, “Importance of High-Order N-Gram Models in Morph-Based Speech Recognition”, IEEE Trans. Audio, Speech, Lang. Process. , Vol. 17, Issue. 4, pp. 724–732, 2009.
[16] H. Adel, NT. Vu, K. Kirchhoff, D. Telaar, T. Schultz, “Syntactic and Semantic Features for Code-Switching Factored Language Models”, IEEE/ACM Trans. Audio, Speech, Lang. Process, Vol. 23, Issue. 3, pp. 431–440, 2015.