Automatically utilizing secondary sources to align information across sources

Martin Michalowski, Snehal Thakkar, Craig A. Knoblock

Research output: Contribution to journalArticle

12 Scopus citations

Abstract

XML, web services, and the semantic web have opened the door for new and exciting information-integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record-linkage process. We present an approach to accurately and automa tically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.

Original languageEnglish (US)
Pages (from-to)33-44
Number of pages12
JournalAI Magazine
Volume26
Issue number1
StatePublished - Mar 1 2005

    Fingerprint

Cite this