Dimension reduction of high-dimensional microbiome data facilitates subsequent analysis such as regression and clustering. Most existing reduction methods cannot fully accommodate the special features of the data such as count-valued and excessive zero reads. We propose a zero-inflated Poisson factor analysis model in this paper. The model assumes that microbiome read counts follow zero-inflated Poisson distributions with library size as offset and Poisson rates negatively related to the inflated zero occurrences. The latent parameters of the model form a low-rank matrix consisting of interpretable loadings and low-dimensional scores that can be used for further analyses. We develop an efficient and robust expectation-maximization algorithm for parameter estimation. We demonstrate the efficacy of the proposed method using comprehensive simulation studies. The application to the Oral Infections, Glucose Intolerance, and Insulin Resistance Study provides valuable insights into the relation between subgingival microbiome and periodontal disease.
Bibliographical noteFunding Information:
Research reported in this publication was supported by the National Institute of Dental & Craniofacial Research of the National Institutes of Health under award number R03DE027773.
© 2020 The International Biometric Society
- 16S sequencing
- factor analysis
- low rank
- microbiome data
- zero inflation
PubMed: MeSH publication types
- Journal Article
- Research Support, N.I.H., Extramural