Comparative Study of Machine Learning Algorithms for Document Classification

Rahul Jain, Archana Thakur

Open Access Article Go Back

Comparative Study of Machine Learning Algorithms for Document Classification

Rahul Jain¹ , Archana Thakur²

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-6 , Page no. 1189-1191, Jun-2019

CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i6.11891191

Online published on Jun 30, 2019

Copyright © Rahul Jain, Archana Thakur . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Citation

IEEE Style Citation: Rahul Jain, Archana Thakur, “Comparative Study of Machine Learning Algorithms for Document Classification,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.6, pp.1189-1191, 2019.

MLA Citation

MLA Style Citation: Rahul Jain, Archana Thakur "Comparative Study of Machine Learning Algorithms for Document Classification." International Journal of Computer Sciences and Engineering 7.6 (2019): 1189-1191.

APA Citation

APA Style Citation: Rahul Jain, Archana Thakur, (2019). Comparative Study of Machine Learning Algorithms for Document Classification. International Journal of Computer Sciences and Engineering, 7(6), 1189-1191.

BibTex Citation

BibTex Style Citation:
@article{Jain_2019,
author = {Rahul Jain, Archana Thakur},
title = {Comparative Study of Machine Learning Algorithms for Document Classification},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2019},
volume = {7},
Issue = {6},
month = {6},
year = {2019},
issn = {2347-2693},
pages = {1189-1191},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4705},
doi = {https://doi.org/10.26438/ijcse/v7i6.11891191}
publisher = {IJCSE, Indore, INDIA},
}

RIS Citation

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i6.11891191}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4705
TI - Comparative Study of Machine Learning Algorithms for Document Classification
T2 - International Journal of Computer Sciences and Engineering
AU - Rahul Jain, Archana Thakur
PY - 2019
DA - 2019/06/30
PB - IJCSE, Indore, INDIA
SP - 1189-1191
IS - 6
VL - 7
SN - 2347-2693
ER -

VIEWS	PDF	XML
587	677 downloads	206 downloads

Bar Line

Abstract

Text classification is a task of distribution of collection of predefined classes to free-text. Text classifiers are not able to organize, structure, and reason just about something. In this work we have used random forest and naïve Bayes algorithms to perform document classification task. We have trained the machine learning models to inference the respective class of the documents. By working on very big data sets of movie reviews the chosen machine learning models predict whether the reviews are positive or negative and then we analyse and compare the results of each model’s individual confusion matrix like precision, recall, f1-score & support. An important observation is that for the same input data random forest provides more relevant results as compared to naïve bayes algorithm. But as the training data grows naïve bayes also performs equally good as random forest.

Key-Words / Index Term

Text Classification, Naïve Bayes, Random Forest, Machine Learning

References

[1] Agarwal, B. Xie, I. Vovsha, O. Rambow, and R.Passonneau, “Sentiment Analysis of Twitter Data,” Annual International Conference New York: Columbia University, 2012.
[2] M.Rambocas, and J. Gama, “Marketing Research: The Role of Sentiment Analysis”. The 5th SNA-KDD Workshop’11. University of Porto, 2013.
[3] Andrew Mc Callumzy, and Kamal Nigamy. “A Comparison of Event Models for Naive Bayes Text Classification”. Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41-48.
[4] Zu G., Ohyama W., Wakabayashi T., Kimura F., "Accuracy improvement of automatic text classification based on feature transformation": Proc: the 2003 ACM Symposium on Document Engineering, November 20-22, 2003, pp.118-120
[5] Chaudhary, A., Kolhe, S., Kamal, R., 2016. A hybrid ensemble for classification in multiclass datasets: An application to oilseed disease dataset. Computers and Electronics in Agriculture 124, pp.65–72.
[6] Chaudhary, A., Kolhe, S., Kamal, R., 2016. An improved random forest classifier for multi-class classification. Information Processing in Agriculture 3, pp. 215-222.
[7] Chaudhary, A., Kolhe, S., Kamal, R., 2012. Machine learning techniques for mobile intelligent systems: A study. In IEEE Ninth International Conference on Wireless and Optical Communications Networks (WOCN), pp. 1-55.
[8] Chaudhary, A., Kolhe, S., Kamal, R., 2013. Machine Learning Classification Techniques: A Comparative Study. International Journal on Advanced Computer Theory and Engineering 2(4), pp. 21-25.
[9] Chaudhary, A., Kolhe, S., Kamal, R., 2013. Machine Learning Techniques for Mobile Devices: A Review. International Journal of Engineering Research and Applications 3(6), pp. 913-917.
[10] Chaudhary, A., Kolhe, S., Kamal, R., 2013. Performance Examination of Feature Selection methods with Machine learning classifiers on mobile devices. International Journal of Engineering Research and Applications 3(6), pp.587-594.
[11] Thakur, A., Thakur, R., 2018. Machine Learning Algorithms for Intelligent Mobile Systems. International Journal of Computer Sciences and Engineering 6(6), pp. 1257-1261.
[12] http://www.cs.cornell.edu/people/pabo/movie-review-data/poldata.README.2.0.txt
[13] https://www.anaconda.com/distribution/#download-section
[14] https://stackabuse.com/using-regex-for-text-manipulation-in-python/
[15] A. Pak, and P. Paroubek, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining,” Special Issue of International Journal of Computer Application, France: Universitede Paris-Sud, 2010.
[16] Forman, G., 2003. “An Experimental Study of Feature Selection Metrics for Text Categorization”. Journal of Machine Learning Research, 3 2003, pp. 1289-1305
[17] https://towardsdatascience.com/machine-learning-nlp-text-classification-using-scikit-learn-python-and-nltk-c52b92a7c73a
[18] Y.H.LI and A.K Jain “Classification of text document”, the computer Journal, vol.41, pp. 8,1998
[19] https://monkeylearn.com/text-classification/
[20] https://www.analyticsvidhya.com/blog/2018/04/a-comprehensive-guide-to-understand-and-implement-text-classification-in-python/

Citations	8797
h-index	34
i10-index	152

Impact Factor :	3.802
ISSN :	2347-2693 (Online)