There is a growing demand for methods to determine the effects that chemical mixtures have on human health. One statistical challenge is identifying true “bad actors” from a mixture of highly correlated predictors, a setting in which standard approaches such as linear regression become highly variable. Weighted Quantile Sum regression has been proposed to address this problem, through a two-step process where mixture component weights are estimated using bootstrap aggregation in a training dataset and inference on the overall mixture effect occurs in a held-out test set. Weighted Quantile Sum regression is popular in applied papers, but the reliance on data splitting is suboptimal, and analysts who use the same data for both steps risk inflating the Type I error rate. We therefore propose a modification of Weighted Quantile Sum regression that uses a permutation test for inference, which allows for weight estimation using the entire dataset and preserves Type I error. To minimize computational burden, we propose replacing the bootstrap with L1 or L2 penalization and describe how to choose the appropriate penalty given expert knowledge about a mixture of interest. We apply our method to a national pregnancy cohort study of prenatal phthalate exposure and child health outcomes.
Bibliographical noteFunding Information:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding was provided by the National Institute of Environmental Health Sciences (grant nos R01 ES016863-04, R01 ES016863-02S4, P30 ES005022, and P30 ES023515) and the National Cancer Institute (grant nos R01 CA214825 and R01 CA225190).
© The Author(s) 2022.
- Chemical mixtures
- environmental health
- phthalic acids
- regression analysis
- variable selection
PubMed: MeSH publication types
- Journal Article
- Research Support, N.I.H., Extramural