Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information

Sandra E. Safo, Shuzhao Li, Qi Long

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.

Original languageEnglish (US)
Pages (from-to)300-312
Number of pages13
JournalBiometrics
Volume74
Issue number1
DOIs
StatePublished - Mar 2018
Externally publishedYes

Fingerprint

Metabolomics
Canonical Correlation Analysis
Sparse Data
metabolomics
transcriptomics
Metabolites
Genes
Gene
metabolites
cardiovascular diseases
Cardiovascular Diseases
genes
Health
Functional Relationship
Information Services
Knowledge-based
Metabolic Networks and Pathways
Data-driven
Statistical method
biochemical pathways

Keywords

  • Biological information
  • Canonical correlation analysis
  • High dimension
  • Integrative analysis
  • Low sample size
  • Sparsity
  • Structural information

PubMed: MeSH publication types

  • Journal Article
  • Research Support, N.I.H., Extramural

Cite this

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. / Safo, Sandra E.; Li, Shuzhao; Long, Qi.

In: Biometrics, Vol. 74, No. 1, 03.2018, p. 300-312.

Research output: Contribution to journalArticle

@article{82ab74f51d704f0ca7265ed8d24231db,
title = "Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information",
abstract = "Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.",
keywords = "Biological information, Canonical correlation analysis, High dimension, Integrative analysis, Low sample size, Sparsity, Structural information",
author = "Safo, {Sandra E.} and Shuzhao Li and Qi Long",
year = "2018",
month = "3",
doi = "10.1111/biom.12715",
language = "English (US)",
volume = "74",
pages = "300--312",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information

AU - Safo, Sandra E.

AU - Li, Shuzhao

AU - Long, Qi

PY - 2018/3

Y1 - 2018/3

N2 - Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.

AB - Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.

KW - Biological information

KW - Canonical correlation analysis

KW - High dimension

KW - Integrative analysis

KW - Low sample size

KW - Sparsity

KW - Structural information

UR - http://www.scopus.com/inward/record.url?scp=85019044088&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019044088&partnerID=8YFLogxK

U2 - 10.1111/biom.12715

DO - 10.1111/biom.12715

M3 - Article

C2 - 28482123

AN - SCOPUS:85019044088

VL - 74

SP - 300

EP - 312

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 1

ER -