Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification

Varun Saproo, Rujuta Upadhyay, Manisha Valera

Open Access Article Go Back

Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification

Varun Saproo¹ , Rujuta Upadhyay² , Manisha Valera³

Section:Survey Paper, Product Type: Journal Paper
Volume-7 , Issue-4 , Page no. 933-937, Apr-2019

CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i4.933937

Online published on Apr 30, 2019

Copyright © Varun Saproo, Rujuta Upadhyay, Manisha Valera . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Varun Saproo, Rujuta Upadhyay, Manisha Valera, “Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.933-937, 2019.

MLA Style Citation: Varun Saproo, Rujuta Upadhyay, Manisha Valera "Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification." International Journal of Computer Sciences and Engineering 7.4 (2019): 933-937.

APA Style Citation: Varun Saproo, Rujuta Upadhyay, Manisha Valera, (2019). Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification. International Journal of Computer Sciences and Engineering, 7(4), 933-937.

BibTex Style Citation:
@article{Saproo_2019,
author = {Varun Saproo, Rujuta Upadhyay, Manisha Valera},
title = {Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2019},
volume = {7},
Issue = {4},
month = {4},
year = {2019},
issn = {2347-2693},
pages = {933-937},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4144},
doi = {https://doi.org/10.26438/ijcse/v7i4.933937}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i4.933937}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4144
TI - Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification
T2 - International Journal of Computer Sciences and Engineering
AU - Varun Saproo, Rujuta Upadhyay, Manisha Valera
PY - 2019
DA - 2019/04/30
PB - IJCSE, Indore, INDIA
SP - 933-937
IS - 4
VL - 7
SN - 2347-2693
ER -

VIEWS	PDF	XML
529	299 downloads	155 downloads

Bar Line

Abstract

Genetic testing and precision medicine have changed how a disease like cancer is treated. It`s a very time- consuming task where a clinical pathologist has to manually review and classify every single genetic mutation based on evidence from text-based clinical literature takes up a considerable amount of human efforts and time. In this paper, we survey different machine learning models with an intent to automate the mutation classification. Additionally, to speed up the learning process while maintaining accuracy, Jeffreys-Multi-Hypothesis (JMH) divergence method is used to select words with large discriminative capacity for classification of text. Text Encoding Schemes like BoW (Bag-of-Words), TF-IDF (Term Frequency-Inverse Document Frequency, and Graph-based TW-IDF (Term Weight - Inverse Document Frequency) is used to encode text to numerical form. Macro-based F1-score is used to score performance during feature selection and model evaluation. This paper surveys the specified methods based on comparisons and tries to conclude which turns out to be better.

Key-Words / Index Term

BoW, TF-IDF,TW-IDF, JMH Divergence, Precision, Recall, F1-Score

References

[1] Chakravarty et.al, “OncoKB: A Precision Oncology Knowledge Base”, JCO Precision Oncology, pp 1-16, 2017
[2] Zheng, “Feature Engineering for Machine Learning”, O’REILLY Publisher, USA, pp 43-45, 2018
[3] M. Liu, “An improvement of TFIDF weighting in text categorization”, In the Proceedings of the 2012 International Conference on Computer Technology and Science, Hong Kong, pp 44-45, 2012
[4] F.D. Malliaros, “Graph-Based Term Weighting for Text Categorization”, In the Proceedings of the 2015 Advances in Social Networks Analysis and Mining, Canada, pp 1473-1479, 2015
[5] Tang, “Toward Optimal Feature Selection in Naive Bayes for Text Categorization”, IEEE Transactions on Knowledge and Data Engineering, Vol.28, Issue.9, pp 2508-2521, 2016
[6] Y. Xu, “A Study on Mutual Information-based Feature Selection for Text Categorization”, Journal of Computational Information Systems, Vol.3, pp 1007-1012, 2007
[7] S.D. Jadhav, “Comparative Study of K-NN, Naive Bayes and Decision Tree Classification Techniques”, International Journal of Science and Research, Vol.5, Issue.1, pp 1842-1845, 2016
[8] Zhang et.al, “Multi-view Ensemble Classification for Clinically Actionable Genetic Mutations”, Springer International Publishing, pp 79-99, 2018
[9] R. Nair, “An Efficient Approach for Sentiment Analysis Using Regression Analysis Technique”, International Journal of Computer Sciences and Engineering, Vol.7, Issue.3, pp 161-165, 2019
[10] Sharma, “Evaluation of Stemming and Stop Word Techniques on Text Classification Problem”, International Journal of Scientific Research in Computer Science and Engineering, Vol.7, Issue.2, pp 1-4, 2015
[11] P. Rutravigneshwaran, “A Study of Intrusion Detection System using Efficient Data Mining Techniques”, International Journal of Scientific Research in Network Security and Communication, Vol.5, Issue.6, pp 5-8, 2017.

Citations	2325
h-index	16
i10-index	47