A System for Phenotype Harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) Program

Adrienne M Stilp, Leslie S Emery, Jai G Broome, Erin J Buth, Alyna T Khan, Cecelia A Laurie, Fei Fei Wang, Quenna Wong, Dongquan Chen, Catherine M D'Augustine, Nancy L Heard-Costa, Chancellor R Hohensee, William Craig Johnson, Lucia D Juarez, Jingmin Liu, Karen M Mutalik, Laura M Raffield, Kerri L Wiggins, Paul S de Vries, Tanika N KellyCharles Kooperberg, Pradeep Natarajan, Gina M Peloso, Patricia A Peyser, Alex P Reiner, Donna K Arnett, Stella Aslibekyan, Kathleen C Barnes, Lawrence F Bielak, Joshua C Bis, Brian E Cade, Ming-Huei Chen, Adolfo Correa, L Adrienne Cupples, Mariza de Andrade, Patrick T Ellinor, Myriam Fornage, Nora Franceschini, Weiniu Gan, Santhi K Ganesh, Jan Graffelman, Megan L Grove, Xiuqing Guo, Nicola L Hawley, Wan-Ling Hsu, Rebecca D Jackson, Cashell E Jaquish, Andrew D Johnson, Sharon L R Kardia, Shannon Kelly, Jiwon Lee, Rasika A Mathias, Stephen T McGarvey, Braxton D Mitchell, May E Montasser, Alanna C Morrison, Kari E North, Seyed Mehdi Nouraie, Elizabeth C Oelsner, Nathan Pankratz, Stephen S Rich, Jerome I Rotter, Jennifer A Smith, Kent D Taylor, Ramachandran S Vasan, Daniel E Weeks, Scott T Weiss, Carla G Wilson, Lisa R Yanek, Bruce M Psaty, Susan R Heckbert, Cathy C Laurie

Research output: Contribution to journalArticlepeer-review

Abstract

Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948-2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.

Original languageEnglish (US)
Pages (from-to)1977-1992
Number of pages16
JournalAmerican journal of epidemiology
Volume190
Issue number10
DOIs
StatePublished - Oct 1 2021

Bibliographical note

Funding Information:
This work was funded by numerous grants and contracts from the National Institutes of Health (NIH), US Department of Health and Human Services. The Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung, and Blood Institute (NHLBI), NIH, with core services provided by the TOPMed Informatics Research Center (award 3R01HL-117626-02S1; contract HHSN268201800002I) and the TOPMed Data Coordinating Center (awards R01HL-120393 and U01HL-120393; contract HHSN268201800001I). Whole-genome sequencing for TOPMed was supported by the NHLBI. Phenotype harmonization activities were funded in part by the NHLBI (contract HHSN26820180001I). Additional harmonization funding was provided by the NHLBI (grant 5 U01 HL 120393-04). Phenotype variable tagging was funded by the NHLBI (grant supplement 3 U01 HL 120393-04S2) and the NIH Office of the Director as part of the NIH Data Commons Pilot Phase Consortium. Additional financial support was provided to some authors: N.F. was additionally supported by NIH grants R01-MD012765, R01-DK117445, and R21-HL140385. P.T.E. was additionally supported by NIH grants R01HL092577, R01HL128914, and K24HL105780. A.P.R. was additionally supported by NIH grant R01HL130733. P.S.d.V. was additionally supported by American Heart Association grant 18CDA34110116. E.C.O. was additionally supported by the NHLBI Pooled Cohorts Study and NIH grants R21-HL129924 and K23-HL130627. S.K.G. was additionally supported by NIH grants R01HL122684 and R01HL139672. B.E.C. was additionally supported by NIH grant K01-HL135405. P.N. and G.M.P. were additionally supported by NIH grant R01HL142711. R.S.V. was supported in part by the Evans Medical Foundation and the Jay and Louis Coffman Endowment from the Department of Medicine, Boston University School of Medicine. Financial support for individual TOPMed studies was provided by the following?Genetics of Cardiometabolic Health in the Amish: The TOPMed component of the Amish Research Program was supported by NIH grants R01 HL121007, U01 HL072515, and R01 AG18728. Atherosclerosis Risk in Communities (ARIC) Study: The ARIC Study has been funded in whole or in part by the NHLBI (contracts HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, and HHSN268201700005I). Coronary Artery Risk Development in Young Adults (CARDIA) Study: The CARDIA Study is conducted and supported by the NHLBI in collaboration with the University of Alabama at Birmingham (awards HHSN268201800005I and HHSN268201800007I), Northwestern University (award HHSN268201800003I), the University of Minnesota (award HHSN268201800006I), and the Kaiser Foundation Research Institute (award HHSN268201800004I). CARDIA is also partially supported by the Intramural Research Program of the National Institute on Aging and an Intra-Agency Agreement (agreement AG0005) between the National Institute on Aging and the NHLBI. Cleveland Family Study: The Cleveland Family Study has been supported in part by the NIH (grants R01-HL046380, KL2-RR024990, R35-HL135818, and R01-HL113338). Cardiovascular Health Study (CHS): The CHS was supported by contracts HHSN268201200036C, HHSN268200800007C, HHSN268201800001C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, and N01HC85086 and grants U01HL080295 and U01HL130114 from the NHLBI, with additional contributions from the National Institute of Neurological Disorders and Stroke. Additional support was provided by the National Institute on Aging (award R01AG023629). Genetic Epidemiology of COPD Study (COPDGene): The COPDGene project was supported by awards U01 HL089897 and U01 HL089856 from the NHLBI. The COPDGene project is also supported by the COPD Foundation through contributions made to an industry advisory board comprised of AstraZeneca AB (Cambridge, United Kingdom), Boehringer Ingelheim (Ingelheim am Rhein, Germany), GlaxoSmithKline plc (London, United Kingdom), Novartis International AG (Basel, Switzerland), Pfizer, Inc. (New York, New York), Siemens Healthcare GmbH (Erlangen, Germany), and Sunovion Pharmaceuticals Inc. (Marlborough, Massachusetts). Genetic Epidemiology of Asthma in Costa Rica (CRA) Study: The CRA Study was funded by the NHLBI (grants R37 HL066289-14 and P01 HL132825). Framingham Heart Study: The Framingham Heart Study was supported by contracts NO1-HC-25195, HHSN268201500001I, and 75N92019D00031 from the NHLBI and by grant supplement R01 HL092577-06S1 for this research. Genetic Epidemiology Network of Arteriopathy (GENOA): Support for GENOA was provided by the NHLBI (awards HL054457, HL054464, HL054481, HL119443, HL087660, and HL085571). Genetics of Lipid Lowering Drugs and Diet Network (GOLDN): GOLDN biospecimens, baseline phenotype data, and intervention phenotype data were collected with funding from the NHLBI (grant U01 HL072524). Whole-genome sequencing in GOLDN was funded by the NHLBI (grant R01 HL104135 and grant supplement R01 HL104135-04S1). Hispanic Community Health Study/Study of Latinos (HCHS/SOL): The HCHS/SOL is a collaborative study supported by contracts between the NHLBI and the University of North Carolina (contract HHSN268201300001I/N01-HC-65233), the University of Miami (contract HHSN268201300004I/N01-HC-65234), the Albert Einstein College of Medicine (contract HHSN268201300002I/N01-HC-65235), and the University of Illinois at Chicago (contract HHSN268201300003I/N01-HC-65236 Northwestern University), and San Diego State University (contract HHSN268201300005I/N01-HC-65237). The following institutions have contributed to the HCHS/SOL through a transfer of funds to the NHLBI: the National Institute on Minority Health and Health Disparities, the National Institute on Deafness and Other Communication Disorders, the National Institute of Dental and Craniofacial Research, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute of Neurological Disorders and Stroke, and the NIH Office of Dietary Supplements. Heart and Vascular Health Study: The Heart and Vascular Health Study was supported by the NHLBI (grants HL068986, HL085251, HL095080, and HL073410). Jackson Heart Study: The Jackson Heart Study is supported by and conducted in collaboration with Jackson State University (contract HHSN268201800013I), Tougaloo College (contract HHSN268201800014I), the Mississippi State Department of Health (contract HHSN268201800015I), and the University of Mississippi Medical Center (contracts HHSN268201800010I, HHSN268201800011I, and HHSN268201800012I) through contracts from the NHLBI and the National Institute on Minority Health and Health Disparities. Mayo Clinic Venous Thromboembolism Study: The Mayo Clinic Venous Thromboembolism Study was funded, in part, by the NHLBI (grants HL66216 and HL83141), the National Human Genome Research Institute (grants HG04735 and HG06379), and the Mayo Foundation. Multi-Ethnic Study of Atherosclerosis (MESA): Whole-genome sequencing for MESA (dbGaP accession number phs001416.v1.p1) was performed at the Broad Institute of MIT and Harvard (Cambridge, Massachusetts) (award 3U54HG003067-13S1). Centralized read mapping and genotype calling, along with variant quality metrics and filtering, were provided by the TOPMed Informatics Research Center (award 3R01HL-117626-02S1). Phenotype harmonization, data management, sample-identity quality control, and general study coordination were provided by the TOPMed Data Coordinating Center (award 3R01HL-120393-02S1). MESA and the MESA SHARe project are conducted and supported by the NHLBI in collaboration with the MESA investigators. Support for MESA is provided by NIH contracts 75N92020D00001 (NHLBI), HHSN268201500003I (NHLBI), N01-HC-95159 (NHLBI), 75N92020D00005 (NHLBI), N01-HC-95160 (NHLBI), 75N92020D00002 (NHLBI), N01-HC-95161 (NHLBI), 75N92020D00003 (NHLBI), N01-HC-95162 (NHLBI), 75N92020D00006 (NHLBI), N01-HC-95163 (NHLBI), 75N92020D00004 (NHLBI), N01-HC-95164 (NHLBI), 75N92020D00007 (NHLBI), N01-HC-95165 (NHLBI), N01-HC-95166 (NHLBI), N01-HC-95167 (NHLBI), N01-HC-95168 (NHLBI), N01-HC-95169 (NHLBI), UL1-TR-000040 (National Center for Advancing Translational Sciences (NCATS) (Clinical and Translational Science Institute (CTSI))), UL1-TR-001079 (NCATS (CTSI)), UL1-TR-001420 (NCATS (CTSI)), UL1-TR-001881 (NCATS (CTSI)), and DK063491 (National Institute of Diabetes and Digestive and Kidney Diseases). Funding for SHARe genotyping was provided by NHLBI contract N02-HL-64278. Genotyping was performed at Affymetrix, Inc. (Santa Clara, California) and the Broad Institute of MIT and Harvard using the Affymetrix Genome-Wide Human SNP Array 6.0. Samoan Adiposity Study: Data collection for the Samoan Adiposity Study was funded by NIH grant R01-HL093093. Women?s Health Initiative: The Women?s Health Initiative program is funded by the NHLBI (contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C).

Funding Information:
Jackson Heart Study is supported by and conducted in collaboration with Jackson State University (contract HHSN268201800013I), Tougaloo College (contract HHSN268201800014I), the Mississippi State Department of Health (contract HHSN268201800015I), and the University of Mississippi Medical Center (contracts HHSN268201800010I, HHSN268201800011I, and HHSN268201800012I) through contracts from the NHLBI and the National Institute on Minority Health and Health Disparities. Mayo Clinic Venous Thromboembolism Study: The Mayo Clinic Venous Thromboembolism Study was funded, in part, by the NHLBI (grants HL66216 and HL83141), the National Human Genome Research Institute (grants HG04735 and HG06379), and the Mayo Foundation. Multi-Ethnic Study of Atherosclerosis (MESA): Whole-genome sequencing for MESA (dbGaP accession number phs001416.v1.p1) was performed at the Broad Institute of MIT and Harvard (Cambridge, Massachusetts) (award 3U54HG003067-13S1). Centralized read mapping and genotype calling, along with variant quality metrics and filtering, were provided by the TOPMed Informatics Research Center (award 3R01HL-117626-02S1). Phenotype harmonization, data management, sample-identity quality control, and general study coordination were provided by the TOPMed Data Coordinating Center (award 3R01HL-120393-02S1). MESA and the MESA SHARe project are conducted and supported by the NHLBI in collaboration with the MESA investigators. Support for MESA is provided by NIH contracts 75N92020D00001 (NHLBI), HHSN268201500003I (NHLBI), N01-HC-95159 (NHLBI), 75N92020D00005 (NHLBI), N01-HC-95160 (NHLBI), 75N92020D00002 (NHLBI), N01-HC-95161 (NHLBI), 75N92020D00003 (NHLBI), N01-HC-95162 (NHLBI), 75N92020D00006 (NHLBI), N01-HC-95163 (NHLBI), 75N92020D00004 (NHLBI), N01-HC-95164 (NHLBI), 75N92020D00007 (NHLBI), N01-HC-95165 (NHLBI), N01-HC-95166 (NHLBI), N01-HC-95167 (NHLBI), N01-HC-95168 (NHLBI), N01-HC-95169 (NHLBI), UL1-TR-000040 (National Center for Advancing Translational Sciences (NCATS) (Clinical and Translational Science Institute (CTSI))), UL1-TR-001079 (NCATS (CTSI)), UL1-TR-001420 (NCATS (CTSI)), UL1-TR-001881 (NCATS (CTSI)), and DK063491 (National Institute of Diabetes and Digestive and Kidney Diseases). Funding for SHARe genotyping was provided by NHLBI contract N02-HL-64278. Genotyping was performed at Affymetrix, Inc. (Santa Clara, California) and the Broad Institute of MIT and Harvard using the Affymetrix Genome-Wide Human SNP Array 6.0. Samoan Adiposity Study: Data collection for the Samoan Adiposity Study was funded by NIH grant R01-HL093093. Women’s Health Initiative: The Women’s Health Initiative program is funded by the NHLBI (contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C). The harmonized data presented in this paper have been submitted to the database of Genotypes and Phenotypes (dbGaP) and the NHLBI BioData Catalyst. The software code with which to reproduce the harmonized phenotypes presented in this paper from dbGaP files is available on GitHub (14). See the “Data Availability” section of the text for details. We gratefully acknowledge the researchers and study participants who provided biological samples and data for TOPMed. We acknowledge Drs. Mike Feolo and Masato Kimura for making the harmonized phenotype data and the phenotype tagging data available in dbGaP. We also acknowledge contributors to the overall TOPMed project, who can be found on the TOPMed Data Coordinating Center website (https://www.nhlbiwgs.org/topmed-banner-authorship). The Genetics of Cardiometabolic Health in the Amish Study investigators gratefully thank the Amish community and research volunteers for their long-standing partnership, and they acknowledge the dedication of their Amish liaisons, fieldworkers, and the Amish Research Clinic staff, without whom these studies would not have been possible. The ARIC Study investigators thank the study staff and participants for their important contributions. The Framingham Heart Study investigators acknowledge the dedication of the study participants, without whom this research would not have been possible. The Jackson Heart Study investigators thank the study staff and participants. The Samoan Adiposity Study investigators thank the Samoan participants in the study and local village authorities. They acknowledge the Samoan Ministry of Health and the Samoa Bureau of Statistics for their support of this research.

Funding Information:
This work was funded by numerous grants and contracts from the National Institutes of Health (NIH), US Department of Health and Human Services. The Trans-Omics in Precision Medicine (TOPMed) program

Funding Information:
was supported by the National Heart, Lung, and Blood Institute (NHLBI), NIH, with core services provided by the TOPMed Informatics Research Center (award 3R01HL-117626-02S1; contract HHSN268201800002I) and the TOPMed Data Coordinating Center (awards R01HL-120393 and U01HL-120393; contract HHSN268201800001I). Whole-genome sequencing for TOPMed was supported by the NHLBI. Phenotype harmonization activities were funded in part by the NHLBI (contract HHSN26820180001I). Additional harmonization funding was provided by the NHLBI (grant 5 U01 HL 120393-04). Phenotype variable tagging was funded by the NHLBI (grant supplement 3 U01 HL 120393-04S2) and the NIH Office of the Director as part of the NIH Data Commons Pilot Phase Consortium. Additional financial support was provided to some authors: N.F. was additionally supported by NIH grants R01-MD012765, R01-DK117445, and R21-HL140385. P.T.E. was additionally supported by NIH grants R01HL092577, R01HL128914, and K24HL105780. A.P.R. was additionally supported by NIH grant R01HL130733. P.S.d.V. was additionally supported by American Heart Association grant 18CDA34110116. E.C.O. was additionally supported by the NHLBI Pooled Cohorts Study and NIH grants R21-HL129924 and K23-HL130627. S.K.G. was additionally supported by NIH grants R01HL122684 and R01HL139672. B.E.C. was additionally supported by NIH grant K01-HL135405. P.N. and G.M.P. were additionally supported by NIH grant R01HL142711. R.S.V. was supported in part by the Evans Medical Foundation and the Jay and Louis Coffman Endowment from the Department of Medicine, Boston University School of Medicine. Financial support for individual TOPMed studies was provided by the following—Genetics of Cardiometabolic Health in the Amish: The TOPMed component of the Amish Research Program was supported by NIH grants R01 HL121007, U01 HL072515, and R01 AG18728. Atherosclerosis Risk in Communities (ARIC) Study: The ARIC Study has been funded in whole or in part by the NHLBI (contracts HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, and HHSN268201700005I). Coronary Artery Risk Development in Young Adults (CARDIA) Study: The CARDIA Study is conducted and supported by the NHLBI in collaboration with the University of Alabama at Birmingham (awards HHSN268201800005I and HHSN268201800007I), Northwestern University (award HHSN268201800003I), the University of Minnesota (award HHSN268201800006I), and the Kaiser Foundation Research Institute (award HHSN268201800004I). CARDIA is also partially supported by the Intramural Research Program of the National Institute on Aging and an Intra-Agency Agreement (agreement AG0005) between the National Institute on Aging and the NHLBI. Cleveland Family Study: The Cleveland Family Study has been supported in part by the NIH (grants R01-HL046380, KL2-RR024990, R35-HL135818, and R01-HL113338). Cardiovascular Health Study (CHS): The CHS was supported by contracts HHSN268201200036C, HHSN268200800007C, HHSN268201800001C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, and N01HC85086 and grants U01HL080295 and U01HL130114 from the NHLBI, with additional contributions from the National Institute of Neurological Disorders and Stroke. Additional support was provided by the National Institute on Aging (award R01AG023629). Genetic Epidemiology of COPD Study (COPDGene): The COPDGene project was supported by awards U01 HL089897 and U01 HL089856 from the NHLBI. The COPDGene project is also supported by the COPD Foundation through contributions made to an industry advisory board comprised of AstraZeneca AB (Cambridge, United Kingdom), Boehringer Ingelheim (Ingelheim am Rhein, Germany), GlaxoSmithKline plc (London, United Kingdom), Novartis International AG (Basel, Switzerland), Pfizer, Inc. (New York, New York), Siemens Healthcare GmbH (Erlangen, Germany), and Sunovion Pharmaceuticals Inc. (Marlborough, Massachusetts). Genetic Epidemiology of Asthma in Costa Rica (CRA) Study: The CRA Study was funded by the NHLBI (grants R37 HL066289-14 and P01 HL132825). Framingham Heart Study: The Framingham Heart Study was supported by contracts NO1-HC-25195, HHSN268201500001I, and 75N92019D00031 from the NHLBI and by grant supplement R01 HL092577-06S1 for this research. Genetic Epidemiology Network of Arteriopathy (GENOA): Support for GENOA was provided by the NHLBI (awards HL054457, HL054464, HL054481, HL119443, HL087660, and HL085571). Genetics of Lipid Lowering Drugs and Diet Network (GOLDN): GOLDN biospecimens, baseline phenotype data, and intervention phenotype data were collected with funding from the NHLBI (grant U01 HL072524). Whole-genome sequencing in GOLDN was funded by the NHLBI (grant R01 HL104135 and grant supplement R01 HL104135-04S1). Hispanic Community Health Study/Study of Latinos (HCHS/SOL): The HCHS/SOL is a collaborative study supported by contracts between the NHLBI and the University of North Carolina (contract HHSN268201300001I/N01-HC-65233), the University of Miami (contract HHSN268201300004I/N01-HC-65234), the Albert Einstein College of Medicine (contract HHSN268201300002I/N01-HC-65235), and the University of Illinois at Chicago (contract HHSN268201300003I/N01-HC-65236 Northwestern University), and San Diego State University (contract HHSN268201300005I/N01-HC-65237). The following institutions have contributed to the HCHS/SOL through a transfer of funds to the NHLBI: the National Institute on Minority Health and Health Disparities, the National Institute on Deafness and Other Communication Disorders, the National Institute of Dental and Craniofacial Research, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute of Neurological Disorders and Stroke, and the NIH Office of Dietary Supplements. Heart and Vascular Health Study: The Heart and Vascular Health Study was supported by the NHLBI (grants HL068986, HL085251,

Funding Information:
A.M.S. receives funding from Seven Bridges Genomics Inc. (Charlestown, Massachusetts) to develop tools for the NHLBI BioData Catalyst consortium. B.M.P. reports serving on the Steering Committee of the Yale Open Data Access Project, which is funded by Johnson & Johnson (New Brunswick, New Jersey). S.A. reports being employed by and holding equity in 23andMe, Inc. (Sunnyvale, California). P.N. reports conflicts of interest unrelated to this work: grant support from AMGen, Inc. (Thousand Oaks, California), Apple Inc. (Cupertino, California), and Boston Scientific Corporation (Marlborough, Massachusetts) and consulting fees from Apple. M.E.M. receives funding from Regeneron Pharmaceuticals Inc. (Tarrytown, New York) unrelated to this work. The other authors have no potential conflicts of interest to declare.

Publisher Copyright:
© The Author(s) 2021.

Keywords

  • cardiovascular disease
  • common data elements
  • hematologic disease
  • information dissemination
  • lung diseases
  • phenotypes
  • sleep-wake disorders
  • Precision Medicine/methods
  • Data Aggregation
  • National Heart, Lung, and Blood Institute (U.S.)
  • United States
  • Humans
  • Information Dissemination
  • Phenotype
  • Genetic Association Studies/methods
  • Phenomics/methods
  • Program Evaluation

PubMed: MeSH publication types

  • Research Support, Non-U.S. Gov't
  • Journal Article
  • Research Support, N.I.H., Extramural

Fingerprint

Dive into the research topics of 'A System for Phenotype Harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) Program'. Together they form a unique fingerprint.

Cite this