Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data

Sandra E. Safo, Jeongyoun Ahn, Yongho Jeon, Sungkyu Jung

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

We present a method for individual and integrative analysis of high dimension, low sample size data that capitalizes on the recurring theme in multivariate analysis of projecting higher dimensional data onto a few meaningful directions that are solutions to a generalized eigenvalue problem. We propose a general framework, called SELP (Sparse Estimation with Linear Programming), with which one can obtain a sparse estimate for a solution vector of a generalized eigenvalue problem. We demonstrate the utility of SELP on canonical correlation analysis for an integrative analysis of methylation and gene expression profiles from a breast cancer study, and we identify some genes known to be associated with breast carcinogenesis, which indicates that the proposed method is capable of generating biologically meaningful insights. Simulation studies suggest that the proposed method performs competitive in comparison with some existing methods in identifying true signals in various underlying covariance structures.

Original languageEnglish (US)
Pages (from-to)1362-1371
Number of pages10
JournalBiometrics
Volume74
Issue number4
DOIs
StatePublished - Dec 2018

Fingerprint

Canonical Correlation Analysis
Generalized Eigenvalue Problem
Methylation
Gene Expression Data
Gene expression
methylation
Linear programming
Linear Programming
Gene Expression
gene expression
linear programming
Genes
Gene Expression Profile
Carcinogenesis
Multivariate Analysis
Covariance Structure
High-dimensional Data
methodology
Breast Cancer
Transcriptome

Keywords

  • Canonical Correlation Analysis
  • Data Integration
  • Generalized Eigenvalue Problem
  • High Dimension
  • Low Sample Size
  • Sparsity

Cite this

Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data. / Safo, Sandra E.; Ahn, Jeongyoun; Jeon, Yongho; Jung, Sungkyu.

In: Biometrics, Vol. 74, No. 4, 12.2018, p. 1362-1371.

Research output: Contribution to journalArticle

@article{55312a3ad0654e6991900aea808cf9c2,
title = "Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data",
abstract = "We present a method for individual and integrative analysis of high dimension, low sample size data that capitalizes on the recurring theme in multivariate analysis of projecting higher dimensional data onto a few meaningful directions that are solutions to a generalized eigenvalue problem. We propose a general framework, called SELP (Sparse Estimation with Linear Programming), with which one can obtain a sparse estimate for a solution vector of a generalized eigenvalue problem. We demonstrate the utility of SELP on canonical correlation analysis for an integrative analysis of methylation and gene expression profiles from a breast cancer study, and we identify some genes known to be associated with breast carcinogenesis, which indicates that the proposed method is capable of generating biologically meaningful insights. Simulation studies suggest that the proposed method performs competitive in comparison with some existing methods in identifying true signals in various underlying covariance structures.",
keywords = "Canonical Correlation Analysis, Data Integration, Generalized Eigenvalue Problem, High Dimension, Low Sample Size, Sparsity",
author = "Safo, {Sandra E.} and Jeongyoun Ahn and Yongho Jeon and Sungkyu Jung",
year = "2018",
month = "12",
doi = "10.1111/biom.12886",
language = "English (US)",
volume = "74",
pages = "1362--1371",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "4",

}

TY - JOUR

T1 - Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data

AU - Safo, Sandra E.

AU - Ahn, Jeongyoun

AU - Jeon, Yongho

AU - Jung, Sungkyu

PY - 2018/12

Y1 - 2018/12

N2 - We present a method for individual and integrative analysis of high dimension, low sample size data that capitalizes on the recurring theme in multivariate analysis of projecting higher dimensional data onto a few meaningful directions that are solutions to a generalized eigenvalue problem. We propose a general framework, called SELP (Sparse Estimation with Linear Programming), with which one can obtain a sparse estimate for a solution vector of a generalized eigenvalue problem. We demonstrate the utility of SELP on canonical correlation analysis for an integrative analysis of methylation and gene expression profiles from a breast cancer study, and we identify some genes known to be associated with breast carcinogenesis, which indicates that the proposed method is capable of generating biologically meaningful insights. Simulation studies suggest that the proposed method performs competitive in comparison with some existing methods in identifying true signals in various underlying covariance structures.

AB - We present a method for individual and integrative analysis of high dimension, low sample size data that capitalizes on the recurring theme in multivariate analysis of projecting higher dimensional data onto a few meaningful directions that are solutions to a generalized eigenvalue problem. We propose a general framework, called SELP (Sparse Estimation with Linear Programming), with which one can obtain a sparse estimate for a solution vector of a generalized eigenvalue problem. We demonstrate the utility of SELP on canonical correlation analysis for an integrative analysis of methylation and gene expression profiles from a breast cancer study, and we identify some genes known to be associated with breast carcinogenesis, which indicates that the proposed method is capable of generating biologically meaningful insights. Simulation studies suggest that the proposed method performs competitive in comparison with some existing methods in identifying true signals in various underlying covariance structures.

KW - Canonical Correlation Analysis

KW - Data Integration

KW - Generalized Eigenvalue Problem

KW - High Dimension

KW - Low Sample Size

KW - Sparsity

UR - http://www.scopus.com/inward/record.url?scp=85061006938&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061006938&partnerID=8YFLogxK

U2 - 10.1111/biom.12886

DO - 10.1111/biom.12886

M3 - Article

C2 - 29750830

AN - SCOPUS:85061006938

VL - 74

SP - 1362

EP - 1371

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 4

ER -