Open Access   Article Go Back

TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT

A. Ghouse Mohiddin S. Ramakrishna1

  1. Dept. of Computer Science, Dravidian University, Kuppam, A.P., India.
  2. Dept. of Computer Science, Sri Venkateswara University, Tirupathi, A.P., India.

Section:Review Paper, Product Type: Journal Paper
Volume-6 , Issue-4 , Page no. 271-276, Apr-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i4.271276

Online published on Apr 30, 2018

Copyright © A. Ghouse Mohiddin S. Ramakrishna . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: A. Ghouse Mohiddin S. Ramakrishna, “TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.4, pp.271-276, 2018.

MLA Style Citation: A. Ghouse Mohiddin S. Ramakrishna "TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT." International Journal of Computer Sciences and Engineering 6.4 (2018): 271-276.

APA Style Citation: A. Ghouse Mohiddin S. Ramakrishna, (2018). TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT. International Journal of Computer Sciences and Engineering, 6(4), 271-276.

BibTex Style Citation:
@article{Ramakrishna_2018,
author = {A. Ghouse Mohiddin S. Ramakrishna},
title = {TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2018},
volume = {6},
Issue = {4},
month = {4},
year = {2018},
issn = {2347-2693},
pages = {271-276},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1883},
doi = {https://doi.org/10.26438/ijcse/v6i4.271276}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i4.271276}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1883
TI - TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT
T2 - International Journal of Computer Sciences and Engineering
AU - A. Ghouse Mohiddin S. Ramakrishna
PY - 2018
DA - 2018/04/30
PB - IJCSE, Indore, INDIA
SP - 271-276
IS - 4
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
563 404 downloads 246 downloads
  
  
           

Abstract

We classify data quality problems that are directed by data cleaning and provide an overview of the principal Solution approaches.Data cleansing is particularly needed when integrating heterogeneous data sources and Should be directed together with schema-related data transformations. We also discuss current tool support for data cleanup. Data profiling is a specific form of data analysis customer data to detect and characterize important features of data sets. Data Analysis offers a delineation of data structure, content, rules and relationships by using statistical methodologies to deliver a lot of standard characteristics about data -data types, field lengths and cardinality of columns, granularity, value sets, format patterns, content patterns, implied rules, and cross-column and cross-file data relationships and cardinality of those relationships. Data deduplication has been advocated as a promising and effective technique to save the digital space by removing the duplicated data from the data centres or clouds. Data deduplication is a process of identifying the redundancy in data and then removing it. The resulting unique data/Consolidate data into single format using data cleansing and Data standardization. Use scorecards to measure data quality progress and shared URL link to the stakeholder.

Key-Words / Index Term

Data Analysis, Data Profiling, Data Cleansing, Data Standardization, Data Score Cards

References

[1]. Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. ACM SIGMOD Record 26(1), 1997.
[2]. Batini, C.; Lenzerini, M.; Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration.In Computing Surveys 18(4):323-364, 1986. [3]. Bouzeghoub, M.; Fabret, F.; Galhardas, H.; Pereira, J; Simon, E.; Matulovic, M.: Data Warehouse Refreshment. In [16]:47-67.
[4]. Abiteboul, S.; Clue, S.; Milo, T.; Mogilevsky, P.; Simeon, J.: Tools for Data Translation and Integration. In [26]:3-8, 1999.
[5]. Lee, M.L.; Lu, H.; Ling, T.W.; Ko, Y.T.: Cleansing Data for Mining and Warehousing. Proc. 10th Intl. Conf.Database and Expert Systems Applications (DEXA), 1999.
[6]. Rundensteiner, E. (ed.): Special Issue on Data Transformation. IEEE Tech. Bull. Data Engineering 22(1), 1999.

[7]. Cohen, W.: Integration of Heterogeneous Databases without Common Domains Using Queries Based Textual Similarity. Proc. ACM SIGMOD Conf. on Data Management, 1998.
[8]. Bernstein, P.A.; Dayal, U.: An Overview of Repository Technology. Proc. 20th VLDB, 1994.
[9]. Quass, D.: A Framework for Research in Data Cleaning. Unpublished Manuscript. Brigham Young Univ., 1999.
[10]. Hernandez, M.A.; Stolfo, S.J.: Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge discovery 2(1):9-37, 1998.
[11]. Erhard Rahm and H. Hai Do. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4):3--13, December 2000.
[12]. M.Jayakameswaraiah, Dr.S.Ramakrishna, “A Study on Prediction Performance of some Data Mining Algorithms”, International Journal of Engineering & Technology, ISSN: 2321 7782, Volume-2, Issue-10, pp 141-144 (2014).
[13]. K.S.N.Prasad, S.Ramakrishna “Text Analytics to Data Warehousing” (IJCSE) International Journal on Computer Science and Engineering” Vol.02,No.06,2010,PP:2201-2207.
[14]. K.S.N.Prasad,S.Ramakrishna”An Autonomous Forest Fire Detection System Based On Spatial Data Mining and Fuzzy Logic”(IJCSNS) International Journal of Computer Science and Network Security,Vol.8 No.12,December 2000.