Abstract
Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population.
Original language | English (US) |
---|---|
Article number | kxae047 |
Journal | Biostatistics |
Volume | 26 |
Issue number | 1 |
DOIs | |
State | Published - 2025 |
Bibliographical note
Publisher Copyright:© 2024 The Author. Published by Oxford University Press. All rights reserved.
Keywords
- Bayesian inference
- microbiome data
- missing data
- multiple imputation
- multiway data
PubMed: MeSH publication types
- Journal Article