Open Access   Article Go Back

Identification of Duplicate Chunks Using Content Approach

Gagandeep Kaur1 , Mandeep Singh Devgan2

  1. Department of Information Technology, Chandigarh Engineering College, Mohali, India.
  2. Department of Information Technology, Chandigarh Engineering College, Mohali, India.

Correspondence should be addressed to: kaurgagandeeparora@gmail.com.

Section:Research Paper, Product Type: Journal Paper
Volume-5 , Issue-10 , Page no. 110-117, Oct-2017

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v5i10.110117

Online published on Oct 30, 2017

Copyright © Gagandeep Kaur, Mandeep Singh Devgan . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Gagandeep Kaur, Mandeep Singh Devgan, “Identification of Duplicate Chunks Using Content Approach,” International Journal of Computer Sciences and Engineering, Vol.5, Issue.10, pp.110-117, 2017.

MLA Style Citation: Gagandeep Kaur, Mandeep Singh Devgan "Identification of Duplicate Chunks Using Content Approach." International Journal of Computer Sciences and Engineering 5.10 (2017): 110-117.

APA Style Citation: Gagandeep Kaur, Mandeep Singh Devgan, (2017). Identification of Duplicate Chunks Using Content Approach. International Journal of Computer Sciences and Engineering, 5(10), 110-117.

BibTex Style Citation:
@article{Kaur_2017,
author = {Gagandeep Kaur, Mandeep Singh Devgan},
title = {Identification of Duplicate Chunks Using Content Approach},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {10 2017},
volume = {5},
Issue = {10},
month = {10},
year = {2017},
issn = {2347-2693},
pages = {110-117},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1484},
doi = {https://doi.org/10.26438/ijcse/v5i10.110117}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v5i10.110117}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1484
TI - Identification of Duplicate Chunks Using Content Approach
T2 - International Journal of Computer Sciences and Engineering
AU - Gagandeep Kaur, Mandeep Singh Devgan
PY - 2017
DA - 2017/10/30
PB - IJCSE, Indore, INDIA
SP - 110-117
IS - 10
VL - 5
SN - 2347-2693
ER -

VIEWS PDF XML
788 474 downloads 260 downloads
  
  
           

Abstract

In this article the implementation of the functions for identification of duplicate chunks based on block, file and content approach have been discussed. The main core of the Deduplication algorithms is chunking and hashing functions. It is also referred as Deduplication granularity. The analysis of these three methods show that the content approach for deduplication is bit slow but the accuracy is good as compared to file and block strategies. It can be seen that the content method of identifying duplicate chunks is about 0.2-0.3% slower but its accuracy is higher by 1-2 % when duplicate finding method of block and file are considered. This work is useful for building duplicate content –aware applications. Especially, when it is used for checking multiple patterns, matching paraphrased content and plagiarism. The proposed methods here can be used for inline as well in the post processing type of Deduplication and it can be extended to include the concept of background and foreground processing.

Key-Words / Index Term

Data Deduplication, Duplicate Chunks, Hashing, Execution Time, Polynomial Chunking

References

[1] K. Ren, C. Wang and Q. Wang, "Security Challenges for the Public Cloud," IEEE Internet Computing, vol. 16, pp. 69-73, 2012.
[2] Y. Fu, H. Jiang and N. Xiao, "AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment," in 2011 IEEE International Conference on Cluster Computing, 2011, pp. 112-120.
[3] J. Malhotra, J. Bakal and L. G. Malik, "Caching: QoS Enabled Metadata Processing Scheme for Data Deduplication," in Proceedings of the International Congress on Information and Communication Technology: ICICT 2015, Volume 2, Springer Singapore, 2016, pp. 545-553.
[4] J. Xiao, Z. Xu and H. Huang, "Security implications of memory deduplication in a virtualized environment," in 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2013, pp. 1-12.
[5] D. Harnik, B. Pinkas and A. S.-. Peleg, "Side Channels in Cloud Services: Deduplication in Cloud Storage," IEEE Security Privacy, vol. 8, pp. 40-47, 2010.
[6] J. Stanek, A. Sorniotti and E. Androulaki, "A Secure Data Deduplication Scheme for Cloud Storage," in Financial Cryptography and Data Security: 18th International Conference, FC 2014, Christ Church, Barbados, March 3-7, 2014, Revised Selected Papers, Springer Berlin Heidelberg, 2014, pp. 99-118.
[7] Y. C. Moon, H. M. Jung, C. Yoo and Y. W. Ko, "Data Deduplication Using Dynamic Chunking Algorithm," in Computational Collective Intelligence. Technologies and Applications: 4th International Conference, ICCCI 2012, Ho Chi Minh City, Vietnam, November 28-30, 2012, Proceedings, Part II, Springer Berlin Heidelberg, 2012, pp. 59-68.
[8] Y. Fu, H. Jiang and N. Xiao, "Application-Aware Local-Global Source Deduplication for Cloud Backup Services of Personal Storage," IEEE Transactions on Parallel and Distributed Systems, vol. 25, pp. 1155-1165, 2014.
[9] A. Katiyar and J. Weissman, "ViDeDup: An Application-Aware Framework for Video De-duplication," in HotStorage, 2011.
[10] W. Leesakul, P. Townend and J. Xu, "Dynamic data deduplication in cloud storage," in Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on, 2014, pp. 320-325.
[11] J. Zhang, S. Han, J. Wan, B. Zhu, L. Zhou, Y. Ren and W. Zhang, "IM-Dedup: An Image Management System Based on Deduplication Applied in DWSNs," International Journal of Distributed Sensor Networks, vol. 9, 2013.
[12] S. Mandal, G. Kuenning, D. Ok, V. Shastry, P. Shilane, S. Zhen, V. Tarasov and E. Zadok, "Using Hints to Improve Inline Block-layer Deduplication," in FAST, 2016, pp. 315-322.
[13] A. Ragini and V. Nararaj, "Exploiting The Chunk Redundancy In Cloud Backup Using Alg-De-Duplication Technique," pp. 18-20, 2015.
[14] B. Mao, H. Jiang, S. Wu, Y. Fu and L. Tian, "Read-performance optimization for deduplication-based storage systems in the cloud," ACM Transactions on Storage (TOS), vol. 10, p. 6, 2014.
[15] S. Zhe , S. Jun and Y. Jianming, "A novel approach to data deduplication over the engineering-oriented cloud systems," Integrated Computer-Aided Engineering, vol. 20, pp. 45-57, 2013.
[16] Z. Chen and K. Shen, "OrderMergeDedup: Efficient, Failure-Consistent Deduplication on Flash," in FAST, 2016, pp. 291-299.
[17] T. Jiang, X. Chen and Q. Wu, "Secure and Efficient Cloud Data Deduplication With Randomized Tag," IEEE Transactions on Information Forensics and Security, vol. 12, p. 3, 2017.
[18] J. Hur, D. Koo, Y. Shin and K. Kang, "Secure data deduplication with dynamic ownership management in cloud storage," IEEE Transactions on Knowledge and Data Engineering, vol. 28, pp. 3113-3125, 2016.
[19] S. Mishra and P. Sharma, "Hybrid Cloud Data Security Model Using Splitting Technique," International Journal of Computer Sciences and Engineering , vol. 4, no. 6, 2016.
[20] Y. Zhou, D. Feng and W. Xia, "SecDep: A user-aware efficient fine-grained secure deduplication scheme with multi-level key management," in 2015 31st Symposium on Mass Storage Systems and Technologies (MSST), 2015, pp. 1-14.
[21] Y. Tan, H. Jiang and D. Feng, "CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services," in 2011 IEEE International Parallel Distributed Processing Symposium, 2011, pp. 1266-1277.