Open Access   Article Go Back

An Efficient Analysis of Deduplication among Cloud Storage

M.Raja 1 , G.Lalithadevi 2 , N.sugavaneswaran 3

Section:Research Paper, Product Type: Journal Paper
Volume-06 , Issue-02 , Page no. 402-405, Mar-2018

Online published on Mar 31, 2018

Copyright © M.Raja, G.Lalithadevi, N.sugavaneswaran . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: M.Raja, G.Lalithadevi, N.sugavaneswaran, “An Efficient Analysis of Deduplication among Cloud Storage,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.02, pp.402-405, 2018.

MLA Style Citation: M.Raja, G.Lalithadevi, N.sugavaneswaran "An Efficient Analysis of Deduplication among Cloud Storage." International Journal of Computer Sciences and Engineering 06.02 (2018): 402-405.

APA Style Citation: M.Raja, G.Lalithadevi, N.sugavaneswaran, (2018). An Efficient Analysis of Deduplication among Cloud Storage. International Journal of Computer Sciences and Engineering, 06(02), 402-405.

BibTex Style Citation:
@article{_2018,
author = {M.Raja, G.Lalithadevi, N.sugavaneswaran},
title = {An Efficient Analysis of Deduplication among Cloud Storage},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2018},
volume = {06},
Issue = {02},
month = {3},
year = {2018},
issn = {2347-2693},
pages = {402-405},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=275},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=275
TI - An Efficient Analysis of Deduplication among Cloud Storage
T2 - International Journal of Computer Sciences and Engineering
AU - M.Raja, G.Lalithadevi, N.sugavaneswaran
PY - 2018
DA - 2018/03/31
PB - IJCSE, Indore, INDIA
SP - 402-405
IS - 02
VL - 06
SN - 2347-2693
ER -

           

Abstract

In recent years, by increasing the volume of information available in data warehouse most of the system may be affected by the replicas. Record deduplication is the important key operation in data integration from multiple data sources on server. To achieve high quality information, remove replica data and more simplified data representation, data preprocessing is required. Data clean-up is one among the data preprocessing steps. Data clean-up includes the process of parsing, Data tree analysis, data transformation, duplicate elimination and arithmetic methods. If two data sets represent the same real world entity then it is called duplicated data’s. The problem of detecting and eliminating duplicate data’s is called record deduplication. This survey presents an analysis of record BAT algorithm, Modified BAT algorithm and Hidden Face algorithms that identify and remove the duplicate records. Duplicate data removal to possible savings in computational time and resources to process this data.

Key-Words / Index Term

Data deduplication, Data Integration, Data preprocessing, BAT algorithm, Modified BAT algorithm, Hidden Face algorithm

References

[1] Moises G. de Carvalho, Alberto H.F. Laender, Marcos AndreGoncalves, and Altigran S. da silva, “A Genetic Programming Approach to Record Deduplication”, IEEE Trans. Knowledge and Data Eng., vol. 24,no. 3, pp. 399-412, Mar. 2012.
[2] A.K. Elmagarmid, P.G. Ipeirotis, and V.S. Verykios, “Duplicate Record Detection: A Survey”, IEEE Trans. Knowledge and Data Eng., vol. 19, no. 1, pp. 1-16, Jan. 2007.
[3] V. Subramaniyaswamy, S. Chenthur Pandian, “A Complete Survey of Duplicate Record Detection Using Data Mining Techniques”, Information Technology Journal 11(8)., ISSN 1812-5638, pp.941- 945, 2012.
[4]. M. Carvalho, A. Laender, M. Goncalves, and A. da Silva. Replica identification using genetic programming. In Proceedings of the 2008 ACM symposium on Applied computing, pages 1801-1806. ACM, 2008.
[5] Faritha Banu, A, Chandrasekar C, “An Optimized Approach of Modified BAT Algorithm to Record Deduplication”, International Journal of Computer Applications (0975 – 8887) Volume 62– No.1, January 2013.
[6] Baoping Zhang, Yuxin Chen, Weiguo Fan, Edward A. Fox , Marcos Gonc¸alves, Marco Cristo, P´avel Calado, “Intelligent GP Fusion from Multiple Sources for Text Classification”
[7] An´ısio Lacerda1 Marco Cristo1 Marcos Andr´e Gonc¸ alves, “Learning to Advertise”, “SIGIR’06, August 6– 11, 2006, Seattle, Washington, USA. Copyright 2006 ACM 1595933697/06/0008.
[8] N. Koudas, S. Sarawagi, and D. Srivastava, “Record linkage: similarity measures and algorithms,” ACM SIGMOD International Conference on Management of Data, pp. 802–803, 2006