Decision Models for Record Linkage Using OCCT-One Class Clustering Tree
D. Angelin Ponrani1
Section:Research Paper, Product Type: Journal Paper
Volume-2 ,
Issue-11 , Page no. 27-30, Nov-2014
Online published on Nov 30, 2014
Copyright © D. Angelin Ponrani . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: D. Angelin Ponrani, “Decision Models for Record Linkage Using OCCT-One Class Clustering Tree,” International Journal of Computer Sciences and Engineering, Vol.2, Issue.11, pp.27-30, 2014.
MLA Style Citation: D. Angelin Ponrani "Decision Models for Record Linkage Using OCCT-One Class Clustering Tree." International Journal of Computer Sciences and Engineering 2.11 (2014): 27-30.
APA Style Citation: D. Angelin Ponrani, (2014). Decision Models for Record Linkage Using OCCT-One Class Clustering Tree. International Journal of Computer Sciences and Engineering, 2(11), 27-30.
BibTex Style Citation:
@article{Ponrani_2014,
author = {D. Angelin Ponrani},
title = {Decision Models for Record Linkage Using OCCT-One Class Clustering Tree},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2014},
volume = {2},
Issue = {11},
month = {11},
year = {2014},
issn = {2347-2693},
pages = {27-30},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=296},
publisher = {IJCSE, Indore, INDIA},
}
RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=296
TI - Decision Models for Record Linkage Using OCCT-One Class Clustering Tree
T2 - International Journal of Computer Sciences and Engineering
AU - D. Angelin Ponrani
PY - 2014
DA - 2014/11/30
PB - IJCSE, Indore, INDIA
SP - 27-30
IS - 11
VL - 2
SN - 2347-2693
ER -
VIEWS | XML | |
3776 | 3401 downloads | 3598 downloads |
Abstract
Record linkage is traditionally performed among the entities of same type. It can be done based on entities that may or may not share a common identifier. In this paper we propose a new linkage method that performs linkage between matching entities of different data types as well. The proposed technique is based on one-class clustering tree that characterizes the entities which are to be linked. The tree is built in such a way that it is easy to understand and can be transformed into association rules. The inner nodes of the tree consist of features of the first set of entities. The leaves of the tree represent features of the second set that are matching. The data is split using two splitting criteria. Also two pruning methods are used for creating one-class clustering tree. The proposed system results better in performance of precision and recall.
Key-Words / Index Term
Linkage, Clustering, Splitting, Decision Tree
References
[1] M.Dror, A.Shabtai, L.Rokach, Y. Elovici, “OCCT: A One- Class Clustering Tree for Implementing One-to- Many Data Linkage,” IEEE Trans. on Knowledge and Data Engineering, TKDE-2011-09-0577, 2013.
[2] M.Yakout, A.K.Elmagarmid, H.Elmeleegy, M.Quzzani and A.Qi, “Behavior Based Record Linkage,” in Proc. of the VLDB Endowment, vol. 3, no 1-2, pp. 439-448, 2010.
[3] A.J.Storkey, C.K.I.Williams, E.Taylorand R.G.Mann, “An
Expectation Maximisation Algorithm for One-to-Many Record Linkage,” University of Edinburgh Informatics Research Report, 2005.
[4] S.Ivie, G.Henry, H.Gatrell and C.Giraud-Carrier, “AMetric Based Machine Learning Approach to Genea- Logical Record Linkage,” in Proc. of the 7th Annual Workshop on Technology for Family History and Genealogical Research, 2007.
[5] P.Christen and K.Goiser, “Towards Automated Data Linkage and Deduplication,” Australian National University, Technical Report, 2005.
[6] P.Langley, Elements of Machine Learning, San Franc-Isco, Morgan Kaufmann, 1996.
[7] S.Guha, R.Rastogi and K.Shim, “Rock: A Robust Clustering Algorithm for Categorical Attributes,” Information Systems, vol. 25, no. 5, pp. 345-366, July 2000.
[8] D.D.Dorfmann and E.Alf, “Maximum-Likelihood EstiMation of Parameters of Signal-Detection Theory and Determination of Confidence Intervals-Rating- Method Data,” Journal of Math Psychology, vol. 6, no. 3, pp. 487-496, 1969
[9] A.Gershman et al., “A Decision Tree Based ecommender System,” in Proc. the 10th Int. Conf. on Innovative Internet Community Services, pp. 170-179, 2010.
[10] J.R.Quinlan, “Induction of Decision Trees,” Machine
Learning, vol. 1, no. 1, pp. 81-106, March 1986.