Open Access   Article Go Back

A Survey on Computational Algorithms For Biological Data Analysis

M.Muthu Lakshmi1 , G.Murugeswari 2

Section:Survey Paper, Product Type: Journal Paper
Volume-06 , Issue-04 , Page no. 154-161, May-2018

Online published on May 31, 2018

Copyright © M.Muthu Lakshmi, G.Murugeswari . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: M.Muthu Lakshmi, G.Murugeswari, “A Survey on Computational Algorithms For Biological Data Analysis,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.04, pp.154-161, 2018.

MLA Style Citation: M.Muthu Lakshmi, G.Murugeswari "A Survey on Computational Algorithms For Biological Data Analysis." International Journal of Computer Sciences and Engineering 06.04 (2018): 154-161.

APA Style Citation: M.Muthu Lakshmi, G.Murugeswari, (2018). A Survey on Computational Algorithms For Biological Data Analysis. International Journal of Computer Sciences and Engineering, 06(04), 154-161.

BibTex Style Citation:
@article{Lakshmi_2018,
author = {M.Muthu Lakshmi, G.Murugeswari},
title = {A Survey on Computational Algorithms For Biological Data Analysis},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2018},
volume = {06},
Issue = {04},
month = {5},
year = {2018},
issn = {2347-2693},
pages = {154-161},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=373},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=373
TI - A Survey on Computational Algorithms For Biological Data Analysis
T2 - International Journal of Computer Sciences and Engineering
AU - M.Muthu Lakshmi, G.Murugeswari
PY - 2018
DA - 2018/05/31
PB - IJCSE, Indore, INDIA
SP - 154-161
IS - 04
VL - 06
SN - 2347-2693
ER -

           

Abstract

Bioinformatics is an interdisciplinary field that uses the information technology algorithms for biological data analysis. Many tools and techniques have been investigated by the researchers for biological data interpretation, analysis and prediction. In accordance with the latest statistics, biological sequence analysis is one of the emerging areas in the field of Bioinformatics. In this paper, a survey on computational algorithms of Bioinformatics has been made. We analyzed the contributions made by the computer researchers for biological sequence analysis and survey is presented on various categories such as biological sequencing, alignment, compression and encoding, feature extraction, clustering and classification. The objective of the paper is to provide a deep understanding and knowledge regarding the existing computer algorithms used for biological data analysis and to identify the research areas for computer researchers in the field of bioinformatics.

Key-Words / Index Term

Bioinformatics, Biological Sequences, DNA, RNA, Protein

References

[1] Bandyopadhyay, Sanghamitra. "An efficient technique for superfamily classification of amino acid sequences: feature extraction, fuzzy clustering and prototype selection." Fuzzy Sets and Systems 152.1 (2005): 5-16.
[2] Behzadi, Behshad, and Fabrice Le Fessant. "DNA compression challenge revisited: a dynamic programming approach." Annual Symposium on Combinatorial Pattern Matching. Springer, Berlin, Heidelberg, 2005.
[3] Benson, Gary. "Tandem repeats finder: a program to analyze DNA sequences." Nucleic acids research 27.2 (1999): 573.
[4] Blazewicz, Jacek, Marta Kasprzak, Michal Kierzynka, Wojciech Frohmberg, Aleksandra Swiercz, Pawel Wojciechowski, and PiotrZurkowski. "Graph algorithms for DNA sequencing–origins, current models and the future." European Journal of Operational Research 264, no. 3 (2018): 799-812.
[5] Cao, Minh Duc, Trevor I. Dix, Lloyd Allison, and Chris Mears. "A simple statistical algorithm for biological sequence compression." In Data Compression Conference, 2007. DCC`07, pp. 43-52. IEEE, 2007.
[6] Chen, Lei, Shiyong Lu, and Jeffrey Ram. "Compressed pattern matching in DNA sequences." In Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE, pp. 62-68. IEEE, 2004.
[7] Chen, Xin, Sam Kwong, and Ming Li. "A compression algorithm for DNA sequences." IEEE Engineering in Medicine and biology Magazine 20.4 (2001): 61-66.
[8] Chen, Yang, and Jinglu Hu. "Accurate reconstruction for DNA sequencing by hybridization based on a constructive heuristic." IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 8.4 (2011): 1134-1140.
[9] Choi, Jeong-Hyeon, Hwan-Gue Cho, and Sun Kim. "GAME: a simple and efficient whole genome alignment method using maximal exact match filtering." Computational Biology and Chemistry 29, no. 3 (2005): 244-253.
[10] Choi, Kwangmin, Youngik Yang, and Sun Kim. "CLASSEQ: Classification of Sequences via Comparative Analysis of Multiple Genomes." Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on. IEEE, 2007.
[11] Fritz, Markus Hsi-Yang, Rasko Leinonen, Guy Cochrane, and Ewan Birney. "Efficient storage of high throughput DNA sequencing data using reference-based compression." Genome research 21, no. 5 (2011): 734-740.
[12] Giancarlo, Raffaele, Davide Scaturro, and Filippo Utro. "Textual data compression in computational biology: a synopsis." Bioinformatics 25.13 (2009): 1575-1586.
[13] Grumbach, Stéphane, and FarizaTahi. "Compression of DNA sequences." In Data Compression Conference, 1993. DCC`93., pp. 340-350. IEEE, 1993.
[14] Guralnik, Valerie, and George Karypis. "A scalable algorithm for clustering sequential data." Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on. IEEE, 2001.
[15] Hach, Faraz, Ibrahim Numanagić, Can Alkan, and S. CenkSahinalp. "SCALCE: boosting sequence compression algorithms using locally consistent encoding." Bioinformatics28, no. 23 (2012): 3051-3057.
[16] Heather, James M., and Benjamin Chain. "The sequence of sequencers: the history of sequencing DNA." Genomics 107.1 (2016): 1-8.
[17] Hira, Zena M., and Duncan F. Gillies. "A review of feature selection and feature extraction methods applied on microarray data." Advances in bioinformatics 2015 (2015).
[18] Kawaji, Hideya, Yosuke Yamaguchi, Hideo Matsuda, and Akihiro Hashimoto. "A graph-based clustering method for a large set of sequences using a graph partitioning algorithm." Genome Informatics 12 (2001): 93-102.
[19] Kchouk, Mehdi, and Faouzi Mhamdi. "New online hierarchical feature extraction algorithm for classification of protein." Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on. IEEE, 2014.
[20] Kelil, Abdellali, Shengrui Wang, Ryszard Brzezinski, and Alain Fleury. "CLUSS: clustering of protein sequences based on a new similarity measure." BMC bioinformatics 8, no. 1 (2007): 286.
[21] Kingsford, Carl, and Rob Patro. "Reference-based compression of short-read sequences using path encoding." Bioinformatics 31, no. 12 (2015): 1920-1928.
[22] Korodi, Gergely, and IoanTabus. "An efficient normalized maximum likelihood algorithm for DNA sequence compression." ACM Transactions on Information Systems (TOIS) 23.1 (2005): 3-34.
[23] Li, Weizhong, and Adam Godzik. "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences." Bioinformatics 22.13 (2006): 1658-1659.
[24] Liu, Libin, Yee-kin Ho, and Stephen Yau. "Clustering DNA sequences by feature vectors." Molecular phylogenetics and evolution 41.1 (2006): 64-69.
[25] Mayilvaganan, M., and R. Rajamani. "Analysis of nucleotide sequence with normal and affected cancer liver cells using Hidden Markov model." Computational Intelligence and Computing Research (ICCIC), 2014 IEEE International Conference on. IEEE, 2014.
[26] Montanari, Piero, Ilaria Bartolini, Paolo Ciaccia, Marco Patella, Stefano Ceri, and Marco Masseroli. "Pattern similarity search in genomic sequences." IEEE Transactions on Knowledge and Data Engineering 28, no. 11 (2016): 3053-3067.
[27] Nicolae, Marius, Sudipta Pathak, and Sanguthevar Rajasekaran. "LFQC: a lossless compression algorithm for FASTQ files." Bioinformatics 31.20 (2015): 3276-3281.
[28] Parsons, J. D., S. Brenner, and M. J. Bishop. "Clustering cDNA sequences." Bioinformatics 8.5 (1992): 461-466.
[29] Pettersson, Erik, Joakim Lundeberg, and Afshin Ahmadian. "Generations of sequencing technologies." Genomics 93.2 (2009): 105-111.
[30] Pinho, Armando J., Diogo Pratas, and Paulo JSG Ferreira. "Bacteria DNA sequence compression using a mixture of finite-context models." Statistical Signal Processing Workshop (SSP), 2011 IEEE. IEEE, 2011.
[31] Ramanujam, E., and S. Padmavathi. "Constraint frequent motif detection in sequence datasets." Advanced Computing (ICoAC), 2012 Fourth International Conference on. IEEE, 2012.
[32] Ren, Xianwen, et al. "iPcc: a novel feature extraction method for accurate disease class discovery and prediction." Nucleic acids research 41.14 (2013): e143-e143.
[33] Saha, Subrata, and SanguthevarRajasekaran. "Efficient algorithms for the compression of FASTQ files." In Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on, pp. 82-85. IEEE, 2014.
[34] Saidi, Rabie, Mondher Maddouri, and Engelbert Mephu Nguifo. "Protein sequences classification by means of feature extraction with substitution matrices." BMC bioinformatics 11.1 (2010): 175.
[35] Stojanov, Done, and Aleksandra Mileva. "A Short Survey of Pair-wise Sequence Alignment Algorithms." (2015): 237-242.
[36] Stranneheim, Henrik, Max Käller, Tobias Allander, Björn Andersson, Lars Arvestad, and Joakim Lundeberg. "Classification of DNA sequences using Bloom filters." Bioinformatics 26, no. 13 (2010): 1595-1600.
[37] Tembe, Waibhav, James Lowey, and Edward Suh. "G-SQZ: compact encoding of genomic sequence and quality data." Bioinformatics 26.17 (2010): 2192-2194.
[38] Wandelt, Sebastian, and Ulf Leser. "FRESCO: Referential compression of highly similar sequences." IEEE/ACM Transactions on Computational Biology and Bioinformatics10.5 (2013): 1275-1288.
[39] Wang, Jason Tsong-Li, Qicheng Ma, Dennis Shasha, and Cathy H. Wu. "New techniques for extracting features from protein sequences." IBM Systems Journal 40, no. 2 (2001): 426-441.
[40] Wendl, M. C., Korf, I., Chinwalla, A. T.,& Hillier, L. W. (2001). Automated processing of raw DNA sequence data. IEEE Engineering in Medicine and Biology Magazine, 20(4), 41-48.
[41] Yona, Golan, Nathan Linial, and Michal Linial. "ProtoMap: automatic classification of protein sequences and hierarchy of protein families." Nucleic acids research 28.1 (2000): 49-55.
[42] Yu, Qiang, Hongwei Huo, Xiaoyang Chen, Haitao Guo, Jeffrey Scott Vitter, and Jun Huan. "An efficient algorithm for discovering motifs in large DNA data sets." IEEE transactions on nanobioscience 14, no. 5 (2015): 535-544.
[43] Zhang, Zheng, Scott Schwartz, Lukas Wagner, and Webb Miller. "A greedy algorithm for aligning DNA sequences." Journal of Computational biology 7, no. 1-2 (2000): 203-214.
[44] Zhou, Hongxia, Liping Du, and Hong Yan. "Detection of tandem repeats in DNA sequences based on parametric spectral estimation." IEEE transactions on information technology in biomedicine 13.5 (2009): 747-755.
[45] Zhou, Qing, and Jun S. Liu. "Extracting sequence features to predict protein–DNA interactions: a comparative study." Nucleic acids research 36.12 (2008): 4137-4148.