Automatically utilizing secondary sources to align information across sources

Martin Michalowski, Snehal Thakkar, Craig A. Knoblock

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

XML, web services, and the semantic web have opened the door for new and exciting information-integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record-linkage process. We present an approach to accurately and automa tically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.

Original languageEnglish (US)
Pages (from-to)33-44
Number of pages12
JournalAI Magazine
Volume26
Issue number1
StatePublished - Mar 1 2005

Fingerprint

Data integration
Semantic Web
XML
Web services

Cite this

Automatically utilizing secondary sources to align information across sources. / Michalowski, Martin; Thakkar, Snehal; Knoblock, Craig A.

In: AI Magazine, Vol. 26, No. 1, 01.03.2005, p. 33-44.

Research output: Contribution to journalArticle

Michalowski, Martin ; Thakkar, Snehal ; Knoblock, Craig A. / Automatically utilizing secondary sources to align information across sources. In: AI Magazine. 2005 ; Vol. 26, No. 1. pp. 33-44.
@article{fc22a0bc3d8146f88a10c2164b39c2cd,
title = "Automatically utilizing secondary sources to align information across sources",
abstract = "XML, web services, and the semantic web have opened the door for new and exciting information-integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record-linkage process. We present an approach to accurately and automa tically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.",
author = "Martin Michalowski and Snehal Thakkar and Knoblock, {Craig A.}",
year = "2005",
month = "3",
day = "1",
language = "English (US)",
volume = "26",
pages = "33--44",
journal = "AI Magazine",
issn = "0738-4602",
publisher = "AI Access Foundation",
number = "1",

}

TY - JOUR

T1 - Automatically utilizing secondary sources to align information across sources

AU - Michalowski, Martin

AU - Thakkar, Snehal

AU - Knoblock, Craig A.

PY - 2005/3/1

Y1 - 2005/3/1

N2 - XML, web services, and the semantic web have opened the door for new and exciting information-integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record-linkage process. We present an approach to accurately and automa tically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.

AB - XML, web services, and the semantic web have opened the door for new and exciting information-integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record-linkage process. We present an approach to accurately and automa tically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.

UR - http://www.scopus.com/inward/record.url?scp=17244376008&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17244376008&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:17244376008

VL - 26

SP - 33

EP - 44

JO - AI Magazine

JF - AI Magazine

SN - 0738-4602

IS - 1

ER -