Linked matrix factorization

Michael J. O'Connell, Eric F. Lock

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Several recent methods address the dimension reduction and decomposition of linked high-content data matrices. Typically, these methods consider one dimension, rows or columns, that is shared among the matrices. This shared dimension may represent common features measured for different sample sets (horizontal integration) or a common sample set with features from different platforms (vertical integration). We introduce an approach for simultaneous horizontal and vertical integration, Linked Matrix Factorization (LMF), for the general case where some matrices share rows (e.g., features) and some share columns (e.g., samples). Our motivating application is a cytotoxicity study with accompanying genomic and molecular chemical attribute data. The toxicity matrix (cell lines × chemicals) shares samples with a genotype matrix (cell lines × SNPs) and shares features with a molecular attribute matrix (chemicals × attributes). LMF gives a unified low-rank factorization of these three matrices, which allows for the decomposition of systematic variation that is shared and systematic variation that is specific to each matrix. This allows for efficient dimension reduction, exploratory visualization, and the imputation of missing data even when entire rows or columns are missing. We present theoretical results concerning the uniqueness, identifiability, and minimal parametrization of LMF, and evaluate it with extensive simulation studies.

Original languageEnglish (US)
Pages (from-to)582-592
Number of pages11
JournalBiometrics
Volume75
Issue number2
DOIs
StatePublished - Jun 2019

Bibliographical note

Funding Information:
This work was supported by the National Institutes of Health National Center for Advancing Translational Sciences (NIH / NCATS) [ULI RR033183 & KL2 RR0333182].

Publisher Copyright:
© 2019 International Biometric Society

Keywords

  • data integration
  • dimension reduction
  • massive data sets
  • missing data imputation
  • principal components analysis

Fingerprint

Dive into the research topics of 'Linked matrix factorization'. Together they form a unique fingerprint.

Cite this