Penalized co-inertia analysis with applications to-omics data

Eun Jeong Min, Sandra E. Safo, Qi Long

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Motivation Co-inertia analysis (CIA) is a multivariate statistical analysis method that can assess relationships and trends in two sets of data. Recently CIA has been used for an integrative analysis of multiple high-dimensional omics data. However, for classical CIA, all elements in the loading vectors are nonzero, presenting a challenge for the interpretation when analyzing omics data. For other multivariate statistical methods such as canonical correlation analysis (CCA), penalized least squares (PLS), various approaches have been proposed to produce sparse loading vectors via l 1 -penalization/constraint. We propose a novel CIA method that uses l 1-penalization to induce sparsity in estimators of loading vectors. Our method simultaneously conducts model fitting and variable selection. Also, we propose another CIA method that incorporates structure/network information such as those from functional genomics, besides using sparsity penalty so that one can get biologically meaningful and interpretable results. Results Extensive simulations demonstrate that our proposed penalized CIA methods achieve the best or close to the best performance compared to the existing CIA method in terms of feature selection and recovery of true loading vectors. Also, we apply our methods to the integrative analysis of gene expression data and protein abundance data from the NCI-60 cancer cell lines. Our analysis of the NCI-60 cancer cell line data reveals meaningful variables for cancer diseases and biologically meaningful results that are consistent with previous studies.

Original languageEnglish (US)
Pages (from-to)1018-1025
Number of pages8
JournalBioinformatics
Volume35
Issue number6
DOIs
StatePublished - Mar 15 2019

Fingerprint

Inertia
Statistical methods
Cells
Cancer
Gene expression
Penalization
Feature extraction
Sparsity
Proteins
Recovery
Cell Line
Neoplasms
Information Services
Multivariate Statistical Analysis
Penalized Least Squares
Functional Genomics
Genomics
Least-Squares Analysis
Canonical Correlation Analysis
Line

Cite this

Penalized co-inertia analysis with applications to-omics data. / Min, Eun Jeong; Safo, Sandra E.; Long, Qi.

In: Bioinformatics, Vol. 35, No. 6, 15.03.2019, p. 1018-1025.

Research output: Contribution to journalArticle

Min, Eun Jeong ; Safo, Sandra E. ; Long, Qi. / Penalized co-inertia analysis with applications to-omics data. In: Bioinformatics. 2019 ; Vol. 35, No. 6. pp. 1018-1025.
@article{feaf393641134a9ba2e68913c36f91c1,
title = "Penalized co-inertia analysis with applications to-omics data",
abstract = "Motivation Co-inertia analysis (CIA) is a multivariate statistical analysis method that can assess relationships and trends in two sets of data. Recently CIA has been used for an integrative analysis of multiple high-dimensional omics data. However, for classical CIA, all elements in the loading vectors are nonzero, presenting a challenge for the interpretation when analyzing omics data. For other multivariate statistical methods such as canonical correlation analysis (CCA), penalized least squares (PLS), various approaches have been proposed to produce sparse loading vectors via l 1 -penalization/constraint. We propose a novel CIA method that uses l 1-penalization to induce sparsity in estimators of loading vectors. Our method simultaneously conducts model fitting and variable selection. Also, we propose another CIA method that incorporates structure/network information such as those from functional genomics, besides using sparsity penalty so that one can get biologically meaningful and interpretable results. Results Extensive simulations demonstrate that our proposed penalized CIA methods achieve the best or close to the best performance compared to the existing CIA method in terms of feature selection and recovery of true loading vectors. Also, we apply our methods to the integrative analysis of gene expression data and protein abundance data from the NCI-60 cancer cell lines. Our analysis of the NCI-60 cancer cell line data reveals meaningful variables for cancer diseases and biologically meaningful results that are consistent with previous studies.",
author = "Min, {Eun Jeong} and Safo, {Sandra E.} and Qi Long",
year = "2019",
month = "3",
day = "15",
doi = "10.1093/bioinformatics/bty726",
language = "English (US)",
volume = "35",
pages = "1018--1025",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - Penalized co-inertia analysis with applications to-omics data

AU - Min, Eun Jeong

AU - Safo, Sandra E.

AU - Long, Qi

PY - 2019/3/15

Y1 - 2019/3/15

N2 - Motivation Co-inertia analysis (CIA) is a multivariate statistical analysis method that can assess relationships and trends in two sets of data. Recently CIA has been used for an integrative analysis of multiple high-dimensional omics data. However, for classical CIA, all elements in the loading vectors are nonzero, presenting a challenge for the interpretation when analyzing omics data. For other multivariate statistical methods such as canonical correlation analysis (CCA), penalized least squares (PLS), various approaches have been proposed to produce sparse loading vectors via l 1 -penalization/constraint. We propose a novel CIA method that uses l 1-penalization to induce sparsity in estimators of loading vectors. Our method simultaneously conducts model fitting and variable selection. Also, we propose another CIA method that incorporates structure/network information such as those from functional genomics, besides using sparsity penalty so that one can get biologically meaningful and interpretable results. Results Extensive simulations demonstrate that our proposed penalized CIA methods achieve the best or close to the best performance compared to the existing CIA method in terms of feature selection and recovery of true loading vectors. Also, we apply our methods to the integrative analysis of gene expression data and protein abundance data from the NCI-60 cancer cell lines. Our analysis of the NCI-60 cancer cell line data reveals meaningful variables for cancer diseases and biologically meaningful results that are consistent with previous studies.

AB - Motivation Co-inertia analysis (CIA) is a multivariate statistical analysis method that can assess relationships and trends in two sets of data. Recently CIA has been used for an integrative analysis of multiple high-dimensional omics data. However, for classical CIA, all elements in the loading vectors are nonzero, presenting a challenge for the interpretation when analyzing omics data. For other multivariate statistical methods such as canonical correlation analysis (CCA), penalized least squares (PLS), various approaches have been proposed to produce sparse loading vectors via l 1 -penalization/constraint. We propose a novel CIA method that uses l 1-penalization to induce sparsity in estimators of loading vectors. Our method simultaneously conducts model fitting and variable selection. Also, we propose another CIA method that incorporates structure/network information such as those from functional genomics, besides using sparsity penalty so that one can get biologically meaningful and interpretable results. Results Extensive simulations demonstrate that our proposed penalized CIA methods achieve the best or close to the best performance compared to the existing CIA method in terms of feature selection and recovery of true loading vectors. Also, we apply our methods to the integrative analysis of gene expression data and protein abundance data from the NCI-60 cancer cell lines. Our analysis of the NCI-60 cancer cell line data reveals meaningful variables for cancer diseases and biologically meaningful results that are consistent with previous studies.

UR - http://www.scopus.com/inward/record.url?scp=85063007342&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063007342&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty726

DO - 10.1093/bioinformatics/bty726

M3 - Article

C2 - 30165424

AN - SCOPUS:85063007342

VL - 35

SP - 1018

EP - 1025

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 6

ER -