TY - JOUR
T1 - A multilevel bayesian approach to improve effect size estimation in regression modeling of metabolomics data utilizing imputation with uncertainty
AU - Gillies, Christopher E.
AU - Jennaro, Theodore S.
AU - Puskarich, Michael A.
AU - Sharma, Ruchi
AU - Ward, Kevin R.
AU - Fan, Xudong
AU - Jones, Alan E.
AU - Stringer, Kathleen A.
N1 - Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2020/8
Y1 - 2020/8
N2 - To ensure scientific reproducibility of metabolomics data, alternative statistical methods are needed. A paradigm shift away from the p-value toward an embracement of uncertainty and interval estimation of a metabolite’s true effect size may lead to improved study design and greater reproducibility. Multilevel Bayesian models are one approach that offer the added opportunity of incorporating imputed value uncertainty when missing data are present. We designed simulations of metabolomics data to compare multilevel Bayesian models to standard logistic regression with corrections for multiple hypothesis testing. Our simulations altered the sample size and the fraction of significant metabolites truly different between two outcome groups. We then introduced missingness to further assess model performance. Across simulations, the multilevel Bayesian approach more accurately estimated the effect size of metabolites that were significantly different between groups. Bayesian models also had greater power and mitigated the false discovery rate. In the presence of increased missing data, Bayesian models were able to accurately impute the true concentration and incorporating the uncertainty of these estimates improved overall prediction. In summary, our simulations demonstrate that a multilevel Bayesian approach accurately quantifies the estimated effect size of metabolite predictors in regression modeling, particularly in the presence of missing data.
AB - To ensure scientific reproducibility of metabolomics data, alternative statistical methods are needed. A paradigm shift away from the p-value toward an embracement of uncertainty and interval estimation of a metabolite’s true effect size may lead to improved study design and greater reproducibility. Multilevel Bayesian models are one approach that offer the added opportunity of incorporating imputed value uncertainty when missing data are present. We designed simulations of metabolomics data to compare multilevel Bayesian models to standard logistic regression with corrections for multiple hypothesis testing. Our simulations altered the sample size and the fraction of significant metabolites truly different between two outcome groups. We then introduced missingness to further assess model performance. Across simulations, the multilevel Bayesian approach more accurately estimated the effect size of metabolites that were significantly different between groups. Bayesian models also had greater power and mitigated the false discovery rate. In the presence of increased missing data, Bayesian models were able to accurately impute the true concentration and incorporating the uncertainty of these estimates improved overall prediction. In summary, our simulations demonstrate that a multilevel Bayesian approach accurately quantifies the estimated effect size of metabolite predictors in regression modeling, particularly in the presence of missing data.
KW - Bayesian statistics
KW - Hierarchical modeling
KW - Imputation
KW - Missing values
KW - Multiple test corrections
KW - Nuclear magnetic resonance spectroscopy
UR - http://www.scopus.com/inward/record.url?scp=85090638958&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090638958&partnerID=8YFLogxK
U2 - 10.3390/metabo10080319
DO - 10.3390/metabo10080319
M3 - Article
C2 - 32781624
AN - SCOPUS:85090638958
SN - 2218-1989
VL - 10
SP - 1
EP - 19
JO - Metabolites
JF - Metabolites
IS - 8
M1 - 319
ER -