A Data Quality Ontology for the Secondary Use of EHR Data

Research output: Contribution to journalArticlepeer-review

34 Scopus citations


The secondary use of EHR data for research is expected to improve health outcomes for patients, but the benefits will only be realized if the data in the EHR is of sufficient quality to support these uses. A data quality (DQ) ontology was developed to rigorously define concepts and enable automated computation of data quality measures. The healthcare data quality literature was mined for the important terms used to describe data quality concepts and harmonized into an ontology. Four high-level data quality dimensions ("correctness", "consistency", "completeness" and "currency") categorize 19 lower level measures. The ontology serves as an unambiguous vocabulary, which defines concepts more precisely than natural language; it provides a mechanism to automatically compute data quality measures; and is reusable across domains and use cases. A detailed example is presented to demonstrate its utility. The DQ ontology can make data validation more common and reproducible.

Original languageEnglish (US)
Pages (from-to)1937-1946
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
StatePublished - Jan 1 2015


Dive into the research topics of 'A Data Quality Ontology for the Secondary Use of EHR Data'. Together they form a unique fingerprint.

Cite this