sJIVE: Supervised joint and individual variation explained

Elise F Palzer, Christine H Wendt, Russell P. Bowler, Craig P. Hersh, Sandra E. Safo, Eric F. Lock

Research output: Contribution to journalArticlepeer-review

Abstract

Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. The proposed method, supervised joint and individual variation explained (sJIVE), can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. These two components are weighted to compromise between explaining variation in the multi-source data and in the outcome. Simulations show sJIVE to outperform existing methods when large amounts of noise are present in the multi-source data. An application to data from the COPDGene study explores gene expression and proteomic patterns associated with lung function.

Original languageEnglish (US)
Article number107547
JournalComputational Statistics and Data Analysis
Volume175
DOIs
StatePublished - Nov 2022

Bibliographical note

Funding Information:
The views expressed in this article are those of the authors and do not reflect the views of the United States Government, the Department of Veterans Affairs, the funders, the sponsors, or any of the authors' affiliated academic institutions. Funding, This work was partially supported by grants R01-GM130622 and 1R35GM142695-01 from the National Institutes of Health and by Award Number U01 HL089897 and Award Number U01 HL089856 from the National Heart, Lung, and Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health. COPDGene is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, Siemens, and Sunovion.

Funding Information:
This work was partially supported by grants R01-GM130622 and 1R35GM142695-01 from the National Institutes of Health and by Award Number U01 HL089897 and Award Number U01 HL089856 from the National Heart, Lung, and Blood Institute . The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.

Funding Information:
COPDGene is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, Siemens, and Sunovion.

Publisher Copyright:
© 2022 Elsevier B.V.

Keywords

  • Data integration
  • Dimension reduction
  • Genomic data
  • High-dimensional prediction
  • Multi-source data
  • Multi-view learning

Fingerprint

Dive into the research topics of 'sJIVE: Supervised joint and individual variation explained'. Together they form a unique fingerprint.

Cite this