A study paradigm integrating prospective epidemiologic cohorts and electronic health records to identify disease biomarkers

Jonathan D. Mosley, Qi Ping Feng, Quinn S. Wells, Sara L. Van Driest, Christian M. Shaffer, Todd L. Edwards, Lisa Bastarache, Wei Qi Wei, Lea K. Davis, Catherine A. McCarty, Will Thompson, Christopher G. Chute, Gail P. Jarvik, Adam S. Gordon, Melody R. Palmer, David R. Crosslin, Eric B. Larson, David S. Carrell, Iftikhar J. Kullo, Jennifer A. PachecoPeggy L. Peissig, Murray H. Brilliant, James G. Linneman, Bahram Namjou, Marc S. Williams, Marylyn D. Ritchie, Kenneth M. Borthwick, Shefali S. Verma, Jason H. Karnes, Scott T. Weiss, Thomas J. Wang, C. Michael Stein, Josh C. Denny, Dan M. Roden

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. A false discovery rate (FDR)-based significance threshold reveals more known and undescribed associations across a broad range of biomarkers, including biometric measures, plasma proteins and metabolites, functional assays, and behaviors. We confirm an inverse association between LDL-cholesterol level and septicemia risk in an independent epidemiological cohort. This approach efficiently discovers biomarker-disease associations.

Original languageEnglish (US)
Article number3522
JournalNature communications
Issue number1
StatePublished - Dec 1 2018

Bibliographical note

Funding Information:
The authors thank the staff and participants of the ARIC study for their contributions. The authors wish to acknowledge the expert technical support of the VANTAGE and VANGARD core facilities, supported in part by the Vanderbilt-Ingram Cancer Center and Vanderbilt Vision Center. This work was supported by a career development award from the Vanderbilt Faculty Research Scholars Fund (J.D.M.), American Heart Association (16FTF30130005) (J.D.M.), PGRN (P50-GM115305), R01 GM10945, R01 LM010685, R01 HL133786-01A1 (W.W.), R01 GM120523 (Q.F.), 16SDG29090005 (J.H. K.). SLV has support through Burroughs Wellcome Fund IRSA 1015006 and a CTSA award KL2TR000446 (NCATS/NIH). VUMC’s BioVU projects are supported by numerous sources: institutional funding, private agencies, and federal grants. These include the NIH funded Shared Instrumentation Grant S10RR025141; CTSA grants UL1TR002243, UL1TR000445, and UL1RR024975. Genomic data are also supported by investigator-led projects that include U01HG004798, R01NS032830, RC2GM092618, P50GM115305, U01HG006378, U19HL065962, R01HD074711; and other funding sources listed at https://victr.vanderbilt.edu/pub/biovu/?sid=229. The eMERGE Network was initiated and funded by NHGRI through the following grants: U01HG006830 (Children’s Hospital of Philadelphia); U01HG006389 (Essentia Institute of Rural Health, Marshfield Clinic Research Foundation and Pennsylvania State University); U01HG006382 (Geisinger Clinic); U01HG006375 (Kaiser Permanente/University of Washington, Seattle); U01HG006379 (Mayo Clinic); U01HG006380 (Icahn School of Medicine at Mount Sinai); U01HG006388 (Northwestern University); U01HG006378; U01HG8685 (Brigham and Women’s Hospital); U01HG8672 (Vanderbilt University Medical Center); and U01HG006385 (Vanderbilt University Medical Center serving as the Coordinating Center); U01HG004438 (CIDR) and U01HG004424 (the Broad Institute) serving as Genotyping Centers. ARIC is supported by NHLBI contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C). ARIC/GENEVA was supported by NHGRI grant U01HG004402 (E. Boerwinkle).

Publisher Copyright:
© 2018, The Author(s).


Dive into the research topics of 'A study paradigm integrating prospective epidemiologic cohorts and electronic health records to identify disease biomarkers'. Together they form a unique fingerprint.

Cite this