Medical records-based chronic kidney disease phenotype for clinical care and “big data” observational and genetic studies

Ning Shang, Atlas Khan, Fernanda Polubriaginof, Francesca Zanoni, Karla Mehl, David Fasel, Paul E. Drawz, Robert J. Carrol, Joshua C. Denny, Matthew A. Hathcock, Adelaide M. Arruda-Olson, Peggy L. Peissig, Richard A. Dart, Murray H. Brilliant, Eric B. Larson, David S. Carrell, Sarah Pendergrass, Shefali Setia Verma, Marylyn D. Ritchie, Barbara BenoitVivian S. Gainer, Elizabeth W. Karlson, Adam S. Gordon, Gail P. Jarvik, Ian B. Stanaway, David R. Crosslin, Sumit Mohan, Iuliana Ionita-Laza, Nicholas P. Tatonetti, Ali G. Gharavi, George Hripcsak, Chunhua Weng, Krzysztof Kiryluk

Research output: Contribution to journalArticlepeer-review

13 Scopus citations


Chronic Kidney Disease (CKD) represents a slowly progressive disorder that is typically silent until late stages, but early intervention can significantly delay its progression. We designed a portable and scalable electronic CKD phenotype to facilitate early disease recognition and empower large-scale observational and genetic studies of kidney traits. The algorithm uses a combination of rule-based and machine-learning methods to automatically place patients on the staging grid of albuminuria by glomerular filtration rate (“A-by-G” grid). We manually validated the algorithm by 451 chart reviews across three medical systems, demonstrating overall positive predictive value of 95% for CKD cases and 97% for healthy controls. Independent case-control validation using 2350 patient records demonstrated diagnostic specificity of 97% and sensitivity of 87%. Application of the phenotype to 1.3 million patients demonstrated that over 80% of CKD cases are undetected using ICD codes alone. We also demonstrated several large-scale applications of the phenotype, including identifying stage-specific kidney disease comorbidities, in silico estimation of kidney trait heritability in thousands of pedigrees reconstructed from medical records, and biobank-based multicenter genome-wide and phenome-wide association studies.

Original languageEnglish (US)
Article number70
Journalnpj Digital Medicine
Issue number1
StatePublished - Dec 2021

Bibliographical note

Funding Information:
The eMERGE Phase III Network was initiated and funded by the National Human Genome Research Institute (NHGRI) through the following grants: U01HG8680 (Columbia University Health Sciences), U01HG8672 (Vanderbilt University Medical Center), U01HG8657 (Kaiser Permanente Washington Health Research Institute/ University of Washington), U01HG8685 (Brigham and Women’s Hospital), U01HG8666 (Cincinnati Children’s Hospital Medical Center), U01HG6379 (Mayo Clinic), U01HG8679 (Geisinger Clinic), U01HG8684 (Children’s Hospital of Philadelphia), U01HG8673 (Northwestern University), MD007593 (Meharry Medical College), U01HG8676 (Partners Healthcare/Broad Institute), and U01HG8664 (Baylor College of Medicine). This work was also funded by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), Kidney Precision Medicine Project (KPMP grant UH3DK114926), the National Library of Medicine grant R01LM013061, and the Precision Medicine Pilot from the Irving Institute/Columbia CTSA (UL1TR001873). Additional sources of funding included R01DK105124 (K.K.), RC2DK116690 (K.K.), and R01LM006910 (G.H.).

Publisher Copyright:
© 2021, The Author(s).

PubMed: MeSH publication types

  • Journal Article


Dive into the research topics of 'Medical records-based chronic kidney disease phenotype for clinical care and “big data” observational and genetic studies'. Together they form a unique fingerprint.

Cite this