We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10 -9). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10 -6). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10 -5), nodular (OR = 0.76, p = 3.1 × 10 -5) and multinodular (OR = 0.69, p = 3.9 × 10 -5) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10 -3), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10 -13), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.
Bibliographical noteFunding Information:
The eMERGE Network was initiated and funded by the National Human Genome Research Institute (NHGRI) through the following grants: U01-HG-004610 (Group Health Cooperative); U01-HG-004608 (Marshfield Clinic); U01-HG-004599 (Mayo Clinic); U01-HG-004609 (Northwestern University); and U01-HG-004603 (Vanderbilt University, also serving as the Coordinating Center). The National Institute of General Medical Sciences (NIGMS) provided additional funding to the eMERGE Network through these grants. The Mayo Genome Consortia are supported by the Mayo Foundation, Mayo Clinic Genome-wide Association Study of Venous Thromboembolism (HG-004735) from the NHGRI (GENEVA Consortium), and the Mayo Clinic SPORE in Pancreatic Cancer (P50CA102701) from the National Cancer Institute.