Open Access   Article Go Back

Comparative Study of Machine Learning Algorithms for Document Classification

Rahul Jain1 , Archana Thakur2

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-6 , Page no. 1189-1191, Jun-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i6.11891191

Online published on Jun 30, 2019

Copyright © Rahul Jain, Archana Thakur . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Rahul Jain, Archana Thakur, “Comparative Study of Machine Learning Algorithms for Document Classification,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.6, pp.1189-1191, 2019.

MLA Style Citation: Rahul Jain, Archana Thakur "Comparative Study of Machine Learning Algorithms for Document Classification." International Journal of Computer Sciences and Engineering 7.6 (2019): 1189-1191.

APA Style Citation: Rahul Jain, Archana Thakur, (2019). Comparative Study of Machine Learning Algorithms for Document Classification. International Journal of Computer Sciences and Engineering, 7(6), 1189-1191.

BibTex Style Citation:
@article{Jain_2019,
author = {Rahul Jain, Archana Thakur},
title = {Comparative Study of Machine Learning Algorithms for Document Classification},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2019},
volume = {7},
Issue = {6},
month = {6},
year = {2019},
issn = {2347-2693},
pages = {1189-1191},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4705},
doi = {https://doi.org/10.26438/ijcse/v7i6.11891191}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i6.11891191}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4705
TI - Comparative Study of Machine Learning Algorithms for Document Classification
T2 - International Journal of Computer Sciences and Engineering
AU - Rahul Jain, Archana Thakur
PY - 2019
DA - 2019/06/30
PB - IJCSE, Indore, INDIA
SP - 1189-1191
IS - 6
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
471 590 downloads 153 downloads
  
  
           

Abstract

Text classification is a task of distribution of collection of predefined classes to free-text. Text classifiers are not able to organize, structure, and reason just about something. In this work we have used random forest and naïve Bayes algorithms to perform document classification task. We have trained the machine learning models to inference the respective class of the documents. By working on very big data sets of movie reviews the chosen machine learning models predict whether the reviews are positive or negative and then we analyse and compare the results of each model’s individual confusion matrix like precision, recall, f1-score & support. An important observation is that for the same input data random forest provides more relevant results as compared to naïve bayes algorithm. But as the training data grows naïve bayes also performs equally good as random forest.

Key-Words / Index Term

Text Classification, Naïve Bayes, Random Forest, Machine Learning

References

[1] Agarwal, B. Xie, I. Vovsha, O. Rambow, and R.Passonneau, “Sentiment Analysis of Twitter Data,” Annual International Conference New York: Columbia University, 2012.
[2] M.Rambocas, and J. Gama, “Marketing Research: The Role of Sentiment Analysis”. The 5th SNA-KDD Workshop’11. University of Porto, 2013.
[3] Andrew Mc Callumzy, and Kamal Nigamy. “A Comparison of Event Models for Naive Bayes Text Classification”. Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41-48.
[4] Zu G., Ohyama W., Wakabayashi T., Kimura F., "Accuracy improvement of automatic text classification based on feature transformation": Proc: the 2003 ACM Symposium on Document Engineering, November 20-22, 2003, pp.118-120
[5] Chaudhary, A., Kolhe, S., Kamal, R., 2016. A hybrid ensemble for classification in multiclass datasets: An application to oilseed disease dataset. Computers and Electronics in Agriculture 124, pp.65–72.
[6] Chaudhary, A., Kolhe, S., Kamal, R., 2016. An improved random forest classifier for multi-class classification. Information Processing in Agriculture 3, pp. 215-222.
[7] Chaudhary, A., Kolhe, S., Kamal, R., 2012. Machine learning techniques for mobile intelligent systems: A study. In IEEE Ninth International Conference on Wireless and Optical Communications Networks (WOCN), pp. 1-55.
[8] Chaudhary, A., Kolhe, S., Kamal, R., 2013. Machine Learning Classification Techniques: A Comparative Study. International Journal on Advanced Computer Theory and Engineering 2(4), pp. 21-25.
[9] Chaudhary, A., Kolhe, S., Kamal, R., 2013. Machine Learning Techniques for Mobile Devices: A Review. International Journal of Engineering Research and Applications 3(6), pp. 913-917.
[10] Chaudhary, A., Kolhe, S., Kamal, R., 2013. Performance Examination of Feature Selection methods with Machine learning classifiers on mobile devices. International Journal of Engineering Research and Applications 3(6), pp.587-594.
[11] Thakur, A., Thakur, R., 2018. Machine Learning Algorithms for Intelligent Mobile Systems. International Journal of Computer Sciences and Engineering 6(6), pp. 1257-1261.
[12] http://www.cs.cornell.edu/people/pabo/movie-review-data/poldata.README.2.0.txt
[13] https://www.anaconda.com/distribution/#download-section
[14] https://stackabuse.com/using-regex-for-text-manipulation-in-python/
[15] A. Pak, and P. Paroubek, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining,” Special Issue of International Journal of Computer Application, France: Universitede Paris-Sud, 2010.
[16] Forman, G., 2003. “An Experimental Study of Feature Selection Metrics for Text Categorization”. Journal of Machine Learning Research, 3 2003, pp. 1289-1305
[17] https://towardsdatascience.com/machine-learning-nlp-text-classification-using-scikit-learn-python-and-nltk-c52b92a7c73a
[18] Y.H.LI and A.K Jain “Classification of text document”, the computer Journal, vol.41, pp. 8,1998
[19] https://monkeylearn.com/text-classification/
[20] https://www.analyticsvidhya.com/blog/2018/04/a-comprehensive-guide-to-understand-and-implement-text-classification-in-python/