Direct covariance matrix estimation with compositional data

Aaron J. Molstad, Karl Oskar Ekvall, Piotr M. Suder

Research output: Contribution to journalArticlepeer-review


Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject’s gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We com-pute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with myalgic encephalomyelitis/chronic fatigue syndrome and through simulation studies.

Original languageEnglish (US)
Pages (from-to)1702-1748
Number of pages47
JournalElectronic Journal of Statistics
Issue number1
StatePublished - 2024

Bibliographical note

Publisher Copyright:
© 2024, Institute of Mathematical Statistics. All rights reserved.


  • Compositional data
  • convex optimization
  • covariance matrix estimation
  • microbiome data analysis
  • positive definiteness


Dive into the research topics of 'Direct covariance matrix estimation with compositional data'. Together they form a unique fingerprint.

Cite this