A heterogeneous field matching method for record linkage

Steven N. Minton, Claude Nanjo, Craig A. Knoblock, Martin Michalowski, Matthew Michelson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

51 Scopus citations

Abstract

Record linkage is the process of determining that two records refer to the same entity. A key subprocess is evaluating how well the individual fields, or attributes, of the records match each other. One approach to matching fields is to use hand-written domain-specific rules. This "expert systems" approach may result in good performance for specific applications, but it is not scalable. This paper describes a new machine learning approach that creates expert-like rules for field matching. In our approach, the relationship between two field values is described by a set of heterogeneous transformations. Previous machine learning methods used simple models to evaluate the distance between two fields. However, our approach enables more sophisticated relationships to be modeled, which better capture the complex domain specific, common-sense phenomena that humans use to judge similarity. We compare our approach to methods that rely on simpler homogeneous models in several domains. By modeling more complex relationships we produce more accurate results.

Original languageEnglish (US)
Title of host publicationProceedings - Fifth IEEE International Conference on Data Mining, ICDM 2005
Pages314-321
Number of pages8
DOIs
StatePublished - 2005
Event5th IEEE International Conference on Data Mining, ICDM 2005 - Houston, TX, United States
Duration: Nov 27 2005Nov 30 2005

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other5th IEEE International Conference on Data Mining, ICDM 2005
Country/TerritoryUnited States
CityHouston, TX
Period11/27/0511/30/05

Fingerprint

Dive into the research topics of 'A heterogeneous field matching method for record linkage'. Together they form a unique fingerprint.

Cite this