Open Access   Article Go Back

Cross Validation Of Supervised Machine Learning Models Based On Random Forest and Support Vector Machine Techniques for 12S rRNA Molecular Marker: Implementation, Comparison and Utility

Rameshwar Pati1 , Ajey Kumar Pathak2 , 3 , Navita Srivastava4

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-11 , Page no. 345-349, Nov-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i11.345349

Online published on Nov 30, 2018

Copyright © Rameshwar Pati, Ajey Kumar Pathak,, Navita Srivastava . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Rameshwar Pati, Ajey Kumar Pathak,, Navita Srivastava, “Cross Validation Of Supervised Machine Learning Models Based On Random Forest and Support Vector Machine Techniques for 12S rRNA Molecular Marker: Implementation, Comparison and Utility,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.11, pp.345-349, 2018.

MLA Style Citation: Rameshwar Pati, Ajey Kumar Pathak,, Navita Srivastava "Cross Validation Of Supervised Machine Learning Models Based On Random Forest and Support Vector Machine Techniques for 12S rRNA Molecular Marker: Implementation, Comparison and Utility." International Journal of Computer Sciences and Engineering 6.11 (2018): 345-349.

APA Style Citation: Rameshwar Pati, Ajey Kumar Pathak,, Navita Srivastava, (2018). Cross Validation Of Supervised Machine Learning Models Based On Random Forest and Support Vector Machine Techniques for 12S rRNA Molecular Marker: Implementation, Comparison and Utility. International Journal of Computer Sciences and Engineering, 6(11), 345-349.

BibTex Style Citation:
@article{Pati_2018,
author = {Rameshwar Pati, Ajey Kumar Pathak,, Navita Srivastava},
title = {Cross Validation Of Supervised Machine Learning Models Based On Random Forest and Support Vector Machine Techniques for 12S rRNA Molecular Marker: Implementation, Comparison and Utility},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2018},
volume = {6},
Issue = {11},
month = {11},
year = {2018},
issn = {2347-2693},
pages = {345-349},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3166},
doi = {https://doi.org/10.26438/ijcse/v6i11.345349}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i11.345349}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3166
TI - Cross Validation Of Supervised Machine Learning Models Based On Random Forest and Support Vector Machine Techniques for 12S rRNA Molecular Marker: Implementation, Comparison and Utility
T2 - International Journal of Computer Sciences and Engineering
AU - Rameshwar Pati, Ajey Kumar Pathak,, Navita Srivastava
PY - 2018
DA - 2018/11/30
PB - IJCSE, Indore, INDIA
SP - 345-349
IS - 11
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
391 196 downloads 265 downloads
  
  
           

Abstract

Folding plays imperative role in the cross validation studies of machine learning based models. The folding divides the original sample into training and test sets, which evaluate performance of the machine learning based models and present scenarios for optimising the efficacy of such models. The present study discusses about the computational approaches applied for preparing training and test sets at different folds from 12S rRNA molecular marker sequence dataset of fish and application of these sets to estimate the performance of the proposed models based on machine learning techniques viz. Random Forest and Support Vector Machine. Additionally, the study presents the comparative accounts on efficacies of these models estimated at different folding. The findings from the study showed that folding has linear relationship with the efficacy of the model. The model with random forest was found better for solving the classification problems of the molecular marker sequence data. This study provides understanding on utility of the folding level in increasing the efficacy of the machine learning based methods and suggests for suitable machine learning method for solving the multiclass problem data especially where the identification using the molecular markers sequence data is involved.

Key-Words / Index Term

Machine learning method, Random forest, Support vector machine, Folding level, 12S rRNA, Cross validation

References

[1] T. Mitchell, “Machine Learning, McGraw Hill Publisher, New York, NY,” pp-441, 1997.
[2] S.U. Bohra, P.V. Ingole , “Review on Neural Network Based Approach Towards English Handwritten Alphanumeric Characters Recognition”, International Journal of Computer Sciences and Engineering, Vol.1, Issue.3, pp.22-25, 2013.
[3] V. Bhambri, “Data Mining as a Solution for Data Management in Banking Sector”, International Journal of Computer Sciences and Engineering, Vol.1, Issue.1, pp.20-25, 2013.
[4] P. Yang, , Hwa Y. Yang, B. Zhou, and Y. Zomaya, et al., “A review of ensemble methods in bioinformatics,” Current Bioinformatics, vol. 5(4), pp. 296–308, 2010.
[5] A.E. Dahlberg, “The functional role of ribosomal RNA in protein synthesis,” Cell, vol. 57, pp. 525–529, 1989.
[6] H.F. Noller, “Structure of ribosomal RNA,” Annual Review Biochemistry, vol. 53, pp. 119–162, 1984.
[7] K.M. Kjer, “Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs,” Molecular Phylogenetics and Evolution, vol. 4, pp. 314–330, 1995.
[8] A.M. Simons and R.L. Mayden, “Phylogenetic relationships of the western North American phoxinins (Actinopterygii: Cyprinidae) as inferred from mitochondrial 12S and 16S ribosomal RNA sequences,” Molecular Phylogenetics and Evolution, vol. 9, pp. 308–329, 1998.
[9] J. Alves-Gomes, G. Orti, M. Haygood, W. Heiligenberg, and A. Meyer, “Phylogenetic analysis of South American electric fishes (order: Gymnotiformes) and the evolution of their electrogenic system: a synthesis based on morphology, electrophysiology, and mitochondrial sequence data,” Molecular Biology and Evolution, vol. 12, pp. 298-318, 1995.
[10] J.C.I. Lee and J.G. Chang, “Random amplified polymorphic DNA polymerase chain reaction (RAPD PCR) fingerprints in forensic species identification,” Forensic Science International, vol. 67(2), pp. 103–107, 1994.
[11] R.S. Blackett and P. Keim, “Big game species identification by deoxyribonucleic acid (DNA) probes,” Journal of Forensic Sciences, vol. 37(2), pp. 590–596, 1992.
[12] R. Meyer, C. Höfelein, J. Lüthy and U. Candrian, “Polymerase chain reaction-restriction fragment length polymorphism analysis: a simple method for species identification in food,” Journal of AOAC International, vol. 78(6), pp. 1542–1551, 1995.
[13] M.L. López-Andreo, Lugo, A. Garrido-Pertierra, M.I. Prieto and A. Puyet, “Identification and quantitation of species in complex DNA mixtures by real-time polymerase chain reaction,” Analytical Biochemistry, vol. 339(1), pp. 73–82, 2005.
[14] NCBI Resource Coordinators, “Database resources of the National Center for Biotechnology Information,” Nucleic Acids Research, vol. 44, pp. D7–D19, 2016.
[15] X. Zhang, J. Lee, and L.A. Chasin, “The effect of nonsense codons on splicing: a genomic analysis,” RNA,vol. 9, pp. 637–639, 2006.
[16] C.M. Vander Walt and E. Barnard, “Data characteristics that determine classifier performance,” Proceedings of the 17th Annual Symposium of the Pattern Recognition Association of South Africa, pp. 166-171, 2006.
[17] Li. Yang, Z. Tan, D. Wang, L. Xue, M. Guan, T. Huang, and R. Li, “Species identification through mitochondrial rRNA genetic analysis,” Scientific Reports, vol. 4, pp. 4089, 2014.
[18] P.K. Meher, T.K. Sahu and A.R. Rao, “Identification of species based on DNA barcode using kmer feature vector and Random forest classifier,” Gene, vol. 592(2), pp. 316-24, 2016.
[19] C. Guisande, A. Manjarrés-Hernández, P. Pelayo-Villamil, C. Granado-Lorencio, I. Riveiro, A. Acu˜na, E. Prieto-Piraquive, E. Janeiro, J.M. Matías, C. Patti, B. Patti, S. Mazzola, S. Jiménez, V. Duqueg and F. Salmerón, “IPez: An expert system for the taxonomic identification of fishes based on machine learning techniques,” Fisheries Research, vol. 102, pp. 240–247, 2010.
[20] Satoh P. Takashi, Miya Masaki, Mabuchi Kohji and Nishida Mutsumi, “Structure and variation of the mitochondrial genome of fishes,” BMC Genomics. Vol. 17,pp. 719, 2016.
[21] E. Weitschek, Iulia G. Fiscon and G. Felici “Supervised DNA Barcodes species classification: analysis, comparisons, and results,” BioData Mining, 7, pp. 4, 2014.