Open Access   Article Go Back

Reference Based Genomic Data Compression Using R Programming

M. Mary Shanthi Rani1 , S. Jegatheesh Chandra Bose2

Section:Research Paper, Product Type: Journal Paper
Volume-06 , Issue-04 , Page no. 328-331, May-2018

Online published on May 31, 2018

Copyright © M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose , “Reference Based Genomic Data Compression Using R Programming,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.04, pp.328-331, 2018.

MLA Style Citation: M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose "Reference Based Genomic Data Compression Using R Programming." International Journal of Computer Sciences and Engineering 06.04 (2018): 328-331.

APA Style Citation: M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose , (2018). Reference Based Genomic Data Compression Using R Programming. International Journal of Computer Sciences and Engineering, 06(04), 328-331.

BibTex Style Citation:
@article{Rani_2018,
author = {M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose },
title = {Reference Based Genomic Data Compression Using R Programming},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2018},
volume = {06},
Issue = {04},
month = {5},
year = {2018},
issn = {2347-2693},
pages = {328-331},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=406},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=406
TI - Reference Based Genomic Data Compression Using R Programming
T2 - International Journal of Computer Sciences and Engineering
AU - M. Mary Shanthi Rani , S. Jegatheesh Chandra Bose
PY - 2018
DA - 2018/05/31
PB - IJCSE, Indore, INDIA
SP - 328-331
IS - 04
VL - 06
SN - 2347-2693
ER -

           

Abstract

Genomics has become a hot research area in medical field for diagnosis of monogenetic disorder identification, pharmaco genetics, targeted therapy, genome editing and personalized medicine. Each human genome consists of 3 billion pairs which are to be effectively stored and transmitted for analysis. This process necessitates the development of novel genomic data compression algorithms. In this paper a referential based method for compressing genomes has been proposed. The input and reference genomes are compared for dissimilarities and further entropy coded to achieve high compression ratio.

Key-Words / Index Term

FASTA file, Genomic data compression, R-Programming, Huffman coding, BIG DATA.

References

[1] S. D. Kahn. “On the future of genomic data. Science (Washington)”,vol.331,pp.728–729, 2011.
[2] J. K. Bonfield and M. V. Mahoney, “Compression of FASTQ and SAM format sequencing data”, PLoS ONE,vol.8, issue.3, 2013.
[3] S. Deorowicz, A. Danek, and M. Niemiec. Gdc , “Compression of large collections of genomes”, arXiv preprint arXiv:1503.01624, 2015.
[4] Y. Zhang, L. Li, Y. Yang, X. Yang, S. He, and Z. Zhu. “Light-weight reference-based compression of FASTQ data”, BMC bioinformatics, vol.16, issue.1, pp.188, 2015.
[5] E. S. Lander, et al., “Initial sequencing and analysis of the human genome”, Nature, vol. 409, pp. 860-921, 2001.
[6] S. Kuruppu, S. J. Puglisi and J. Zobel, “Optimized relative Lempel-Ziv compression of genomes”, Proceeding of ACSC 2011.
[7] P.SubrahmanyaandT.Berger, “Asliding window Lempel-Zivalgorithm for differential layer encoding in progressive transmission”, Proc. IEEE Int. Symp. Inf. Theory, Whistler, BC, Canada, pp. 266,995, 1995.
[8] Kwang Su Jung, Nam Hee Yu, Seung Jung Shin, Keun Ho Ryu, “A Compressing Method for Genome Sequence Cluster using Sequence Alignment”, 2008
[9] M.Mary Shanthi Rani, “A New Referential Method for Compressing Genomes” International Journal of Computational Bioinformatics and In Silico Modeling, Research Article Open Access, Vol. 4, issue.1, pp.592-596 2015.
[10] Biji CL and AchuthsankarS.Nair, “Benchmark dataset for Whole Genome sequence compression”, pp.1545-5963, 2016.
[11] RabiaArshad,AdeelSaleem and Danista Khan, “ Performance Comparison of Huffman Coding and Double Huffman Coding ”,978-1-pp.5090-2000,2016.
[12] Komal Sharma, Kunal Gupta, “Losseless Data Comperssion Techniques and Their Performance”, ieee, 2017.
[13] Kakoli Banerjee and A.Prasad, “Reference Based Inter Chromosomal Similarity based DNA sequence Compression algorithm”, ,ieee, 2017