Predicting DNA methylation from genetic data lacking racial diversity using shared classified random effects

J. Sunil Rao, Hang Zhang, Erin Kobetz, Melinda C. Aldrich, Douglas Conway

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Public genomic repositories are notoriously lacking in racially and ethnically diverse samples. This limits the reaches of exploration and has in fact been one of the driving factors for the initiation of the All of Us project. Our particular focus here is to provide a model-based framework for accurately predicting DNA methylation from genetic data using racially sparse public repository data. Epigenetic alterations are of great interest in cancer research but public repository data is limited in the information it provides. However, genetic data is more plentiful. Our phenotype of interest is cervical cancer in The Cancer Genome Atlas (TCGA) repository. Being able to generate such predictions would nicely complement other work that has generated gene-level predictions of gene expression for normal samples. We develop a new prediction approach which uses shared random effects from a nested error mixed effects regression model. The sharing of random effects allows borrowing of strength across racial groups greatly improving predictive accuracy. Additionally, we show how to further borrow strength by combining data from different cancers in TCGA even though the focus of our predictions is DNA methylation in cervical cancer. We compare our methodology against other popular approaches including the elastic net shrinkage estimator and random forest prediction. Results are very encouraging with the shared classified random effects approach uniformly producing more accurate predictions – overall and for each racial group.

Original languageEnglish (US)
Pages (from-to)1018-1028
Number of pages11
Issue number1
StatePublished - Jan 2021
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2020 Elsevier Inc.


  • DNA methylation
  • Mixed effects models
  • Prediction
  • Racial diversity


Dive into the research topics of 'Predicting DNA methylation from genetic data lacking racial diversity using shared classified random effects'. Together they form a unique fingerprint.

Cite this