TY - JOUR
T1 - Entity Identification in Database Integration
AU - Lim, Ee Peng
AU - Srivastava, Jaideep
AU - Prabhakar, Satya
AU - Richardson, James
N1 - Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 1996/2
Y1 - 1996/2
N2 - The objective of entity identification is to determine the correspondence between objective instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity-identification technique. To achieve soundness, a set of identity and distinctness rules have to be established for the entities in the integrated world. We then propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule to determine the equivalence between tuples from relations that may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple. Formal properties of ILFDs are derived. Results from a Prolog-based prototype entity-identification system are presented.
AB - The objective of entity identification is to determine the correspondence between objective instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity-identification technique. To achieve soundness, a set of identity and distinctness rules have to be established for the entities in the integrated world. We then propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule to determine the equivalence between tuples from relations that may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple. Formal properties of ILFDs are derived. Results from a Prolog-based prototype entity-identification system are presented.
UR - http://www.scopus.com/inward/record.url?scp=0030083481&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030083481&partnerID=8YFLogxK
U2 - 10.1016/0020-0255(95)00185-9
DO - 10.1016/0020-0255(95)00185-9
M3 - Article
AN - SCOPUS:0030083481
SN - 0020-0255
VL - 89
SP - 1
EP - 38
JO - Information Sciences
JF - Information Sciences
IS - 1-2
ER -