Reference Based Genomic Data Compression Using R Programming
M. Mary Shanthi Rani1 , S. Jegatheesh Chandra Bose2
Section:Research Paper, Product Type: Journal Paper
Volume-06 ,
Issue-04 , Page no. 328-331, May-2018
Online published on May 31, 2018
Copyright © M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose , “Reference Based Genomic Data Compression Using R Programming,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.04, pp.328-331, 2018.
MLA Style Citation: M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose "Reference Based Genomic Data Compression Using R Programming." International Journal of Computer Sciences and Engineering 06.04 (2018): 328-331.
APA Style Citation: M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose , (2018). Reference Based Genomic Data Compression Using R Programming. International Journal of Computer Sciences and Engineering, 06(04), 328-331.
BibTex Style Citation:
@article{Rani_2018,
author = {M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose },
title = {Reference Based Genomic Data Compression Using R Programming},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2018},
volume = {06},
Issue = {04},
month = {5},
year = {2018},
issn = {2347-2693},
pages = {328-331},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=406},
publisher = {IJCSE, Indore, INDIA},
}
RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=406
TI - Reference Based Genomic Data Compression Using R Programming
T2 - International Journal of Computer Sciences and Engineering
AU - M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose
PY - 2018
DA - 2018/05/31
PB - IJCSE, Indore, INDIA
SP - 328-331
IS - 04
VL - 06
SN - 2347-2693
ER -
![](icone_social/Facebook.png)
![](icone_social/Twitter.png)
![](icone_social/Linkedin.png)
![](icone_social/Google+.png)
Abstract
Genomics has become a hot research area in medical field for diagnosis of monogenetic disorder identification, pharmaco genetics, targeted therapy, genome editing and personalized medicine. Each human genome consists of 3 billion pairs which are to be effectively stored and transmitted for analysis. This process necessitates the development of novel genomic data compression algorithms. In this paper a referential based method for compressing genomes has been proposed. The input and reference genomes are compared for dissimilarities and further entropy coded to achieve high compression ratio.
Key-Words / Index Term
FASTA file, Genomic data compression, R-Programming, Huffman coding, BIG DATA.
References
[1] S. D. Kahn. “On the future of genomic data. Science (Washington)”,vol.331,pp.728–729, 2011.
[2] J. K. Bonfield and M. V. Mahoney, “Compression of FASTQ and SAM format sequencing data”, PLoS ONE,vol.8, issue.3, 2013.
[3] S. Deorowicz, A. Danek, and M. Niemiec. Gdc , “Compression of large collections of genomes”, arXiv preprint arXiv:1503.01624, 2015.
[4] Y. Zhang, L. Li, Y. Yang, X. Yang, S. He, and Z. Zhu. “Light-weight reference-based compression of FASTQ data”, BMC bioinformatics, vol.16, issue.1, pp.188, 2015.
[5] E. S. Lander, et al., “Initial sequencing and analysis of the human genome”, Nature, vol. 409, pp. 860-921, 2001.
[6] S. Kuruppu, S. J. Puglisi and J. Zobel, “Optimized relative Lempel-Ziv compression of genomes”, Proceeding of ACSC 2011.
[7] P.SubrahmanyaandT.Berger, “Asliding window Lempel-Zivalgorithm for differential layer encoding in progressive transmission”, Proc. IEEE Int. Symp. Inf. Theory, Whistler, BC, Canada, pp. 266,995, 1995.
[8] Kwang Su Jung, Nam Hee Yu, Seung Jung Shin, Keun Ho Ryu, “A Compressing Method for Genome Sequence Cluster using Sequence Alignment”, 2008
[9] M.Mary Shanthi Rani, “A New Referential Method for Compressing Genomes” International Journal of Computational Bioinformatics and In Silico Modeling, Research Article Open Access, Vol. 4, issue.1, pp.592-596 2015.
[10] Biji CL and AchuthsankarS.Nair, “Benchmark dataset for Whole Genome sequence compression”, pp.1545-5963, 2016.
[11] RabiaArshad,AdeelSaleem and Danista Khan, “ Performance Comparison of Huffman Coding and Double Huffman Coding ”,978-1-pp.5090-2000,2016.
[12] Komal Sharma, Kunal Gupta, “Losseless Data Comperssion Techniques and Their Performance”, ieee, 2017.
[13] Kakoli Banerjee and A.Prasad, “Reference Based Inter Chromosomal Similarity based DNA sequence Compression algorithm”, ,ieee, 2017