Open Access   Article Go Back

Text Classification: A Comparative Analysis of Word Embedding Algorithms

R. Janani1 , S. Vijayarani2

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-4 , Page no. 818-822, Apr-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i4.818822

Online published on Apr 30, 2019

Copyright © R. Janani, S. Vijayarani . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: R. Janani, S. Vijayarani, “Text Classification: A Comparative Analysis of Word Embedding Algorithms,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.818-822, 2019.

MLA Style Citation: R. Janani, S. Vijayarani "Text Classification: A Comparative Analysis of Word Embedding Algorithms." International Journal of Computer Sciences and Engineering 7.4 (2019): 818-822.

APA Style Citation: R. Janani, S. Vijayarani, (2019). Text Classification: A Comparative Analysis of Word Embedding Algorithms. International Journal of Computer Sciences and Engineering, 7(4), 818-822.

BibTex Style Citation:
@article{Janani_2019,
author = {R. Janani, S. Vijayarani},
title = {Text Classification: A Comparative Analysis of Word Embedding Algorithms},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2019},
volume = {7},
Issue = {4},
month = {4},
year = {2019},
issn = {2347-2693},
pages = {818-822},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4123},
doi = {https://doi.org/10.26438/ijcse/v7i4.818822}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i4.818822}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4123
TI - Text Classification: A Comparative Analysis of Word Embedding Algorithms
T2 - International Journal of Computer Sciences and Engineering
AU - R. Janani, S. Vijayarani
PY - 2019
DA - 2019/04/30
PB - IJCSE, Indore, INDIA
SP - 818-822
IS - 4
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
488 374 downloads 141 downloads
  
  
           

Abstract

Text classification is the task of allocating the documents into one or more number of predefined categories. In general, this technique is used in the field of information retrieval, text summarization and, text extraction. To perform the classification task, transformation of text into feature vectors is the important stage. The main advantage of this transformation is to discover the most significant words from the document. This process is also known as word embedding, which is used to represent the meaning of words into vector format. The word embedding’s are employed in a high dimensional space where the embeddings of similar or related words are adjacent to each other. This main aim of this research work is to classify the text documents based on their contents. In order to achieve this task, in this research work the different word embedding algorithms are used to represent documents. The performance measures are Precision, recall, f-measure and accuracy.

Key-Words / Index Term

Text Classification, Document Representation, Word Embedding, Word2Vec, GloVe, WordRank

References

[1]. Korde, V., & Mahender, C. N. (2012). Text classification and classifiers: A survey. International Journal of Artificial Intelligence & Applications, 3(2), 85.
[2]. Jon Ezeiza Alvarez. (2017). A review of word embedding and document similarity algorithms applied to academic text
[3]. Liu, Q., Huang, H., Gao, Y., Wei, X., Tian, Y., & Liu, L. (2018, August). Task-oriented word embedding for text classification. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 2023-2032).
[4]. Yu, L. C., Wang, J., Lai, K. R., & Zhang, X. (2017, September). Refining word embeddings for sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 534-539).
[5]. Li, L., Qin, B., & Liu, T. (2017). Contradiction detection with contradiction-specific word embedding. Algorithms, 10(2), 59.
[6]. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155.
[7]. Bollegala, D., Alsuhaibani, M., Maehara, T., & Kawarabayashi, K. I. (2016, March). Joint word representation learning using a corpus and a semantic lexicon. In Thirtieth AAAI Conference on Artificial Intelligence.
[8]. Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166.
[9]. Diaz, F., Mitra, B., & Craswell, N. (2016). Query expansion with locally-trained word embeddings. arXiv preprint arXiv:1605.07891.
[10]. Zamani, H., & Croft, W. B. (2017, August). Relevance-based word embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 505-514). ACM.
[11]. Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing & Management, 50(1), 104-112.
[12]. Bollegala, D., Yoshida, Y., & Kawarabayashi, K. I. (2018, April). Using k-way Co-occurrences for Learning Word Embeddings. In Thirty-Second AAAI Conference on Artificial Intelligence.
[13]. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
[14]. Dutta, D. (2018). A Review of Different Word Embeddings for Sentiment Classification using Deep Learning. arXiv preprint arXiv:1807.02471.
[15]. Mandelbaum, A., & Shalev, A. (2016). Word embeddings and their use in sentence classification tasks. arXiv preprint arXiv:1610.08229.
[16]. Rosander, O., & Ahlstrand, J. (2018). Email Classification with Machine Learning and Word Embeddings for Improved Customer Support.