TY - JOUR
T1 - A Pan-Cancer and Polygenic Bayesian Hierarchical Model for the Effect of Somatic Mutations on Survival
AU - Samorodnitsky, Sarah
AU - Hoadley, Katherine A.
AU - Lock, Eric F.
N1 - Publisher Copyright:
© The Author(s) 2020.
PY - 2020
Y1 - 2020
N2 - We built a novel Bayesian hierarchical survival model based on the somatic mutation profile of patients across 50 genes and 27 cancer types. The pan-cancer quality allows for the model to “borrow” information across cancer types, motivated by the assumption that similar mutation profiles may have similar (but not necessarily identical) effects on survival across different tissues of origin or tumor types. The effect of a mutation at each gene was allowed to vary by cancer type, whereas the mean effect of each gene was shared across cancers. Within this framework, we considered 4 parametric survival models (normal, log-normal, exponential, and Weibull), and we compared their performance via a cross-validation approach in which we fit each model on training data and estimate the log-posterior predictive likelihood on test data. The log-normal model gave the best fit, and we investigated the partial effect of each gene on survival via a forward selection procedure. Through this we determined that mutations at TP53 and FAT4 were together the most useful for predicting patient survival. We validated the model via simulation to ensure that our algorithm for posterior computation gave nominal coverage rates. The code used for this analysis can be found at https://github.com/sarahsamorodnitsky/Pan-Cancer-Survival-Modeling.git, and the results are summarized at http://ericfrazerlock.com/surv_figs/SurvivalDisplay.html.
AB - We built a novel Bayesian hierarchical survival model based on the somatic mutation profile of patients across 50 genes and 27 cancer types. The pan-cancer quality allows for the model to “borrow” information across cancer types, motivated by the assumption that similar mutation profiles may have similar (but not necessarily identical) effects on survival across different tissues of origin or tumor types. The effect of a mutation at each gene was allowed to vary by cancer type, whereas the mean effect of each gene was shared across cancers. Within this framework, we considered 4 parametric survival models (normal, log-normal, exponential, and Weibull), and we compared their performance via a cross-validation approach in which we fit each model on training data and estimate the log-posterior predictive likelihood on test data. The log-normal model gave the best fit, and we investigated the partial effect of each gene on survival via a forward selection procedure. Through this we determined that mutations at TP53 and FAT4 were together the most useful for predicting patient survival. We validated the model via simulation to ensure that our algorithm for posterior computation gave nominal coverage rates. The code used for this analysis can be found at https://github.com/sarahsamorodnitsky/Pan-Cancer-Survival-Modeling.git, and the results are summarized at http://ericfrazerlock.com/surv_figs/SurvivalDisplay.html.
KW - Bayesian hierarchical modeling
KW - The Cancer Genome Atlas
KW - pan-cancer modeling
KW - survival analysis
UR - http://www.scopus.com/inward/record.url?scp=85081357099&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081357099&partnerID=8YFLogxK
U2 - 10.1177/1176935120907399
DO - 10.1177/1176935120907399
M3 - Article
C2 - 32116467
AN - SCOPUS:85081357099
SN - 1176-9351
VL - 19
JO - Cancer Informatics
JF - Cancer Informatics
ER -