Abstract
Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject’s gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We com-pute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with myalgic encephalomyelitis/chronic fatigue syndrome and through simulation studies.
Original language | English (US) |
---|---|
Pages (from-to) | 1702-1748 |
Number of pages | 47 |
Journal | Electronic Journal of Statistics |
Volume | 18 |
Issue number | 1 |
DOIs | |
State | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2024, Institute of Mathematical Statistics. All rights reserved.
Keywords
- Compositional data
- convex optimization
- covariance matrix estimation
- microbiome data analysis
- positive definiteness