A strategy for multimodal data integration: Application to biomarkers identification in spinocerebellar ataxia

Imene Garali, Isaac Adanyeguh, Farid Ichou, Vincent Perlbarg, Alexandre Seyer, Benoit Colsch, Ivan Moszer, Vincent Guillemot, Alexandra Durr, Fanny Mochel, Arthur Tenenhaus

Research output: Contribution to journalArticlepeer-review

28 Scopus citations


The growing number of modalities (e.g. multi-omics, imaging and clinical data) characterizing a given disease provides physicians and statisticians with complementary facets reflecting the disease process but emphasizes the need for novel statistical methods of data analysis able to unify these views. Such data sets are indeed intrinsically structured in blocks, where each block represents a set of variables observed on a group of individuals. Therefore, classical statistical tools cannot be applied without altering their organization, with the risk of information loss. Regularized generalized canonical correlation analysis (RGCCA) and its sparse generalized canonical correlation analysis (SGCCA) counterpart are component-based methods for exploratory analyses of data sets structured in blocks of variables. Rather than operating sequentially on parts of the measurements, the RGCCA/SGCCA-based integrative analysis method aims at summarizing the relevant information between and within the blocks. It processes a priori information defining which blocks are supposed to be linked to one another, thus reflecting hypotheses about the biology underlying the data blocks. It also requires the setting of extra parameters that need to be carefully adjusted. Here, we provide practical guidelines for the use of RGCCA/SGCCA. We also illustrate the flexibility and usefulness of RGCCA/SGCCA on a unique cohort of patients with four genetic subtypes of spinocerebellar ataxia, in which we obtained multiple data sets from brain volumetry and magnetic resonance spectroscopy, and metabolomic and lipidomic analyses. As a first step toward the extraction of multimodal biomarkers, and through the reduction to a few meaningful components and the visualization of relevant variables, we identified possible markers of disease progression.

Original languageEnglish (US)
Pages (from-to)1356-1369
Number of pages14
JournalBriefings in Bioinformatics
Issue number6
StatePublished - May 30 2017
Externally publishedYes

Bibliographical note

Funding Information:
This study was sponsored by the Assistance-Publique des Hôpitaux de Paris and supported by grants from the French Ministry of Health (PHRC BIOSCA - ID RCB: 2010-A01324-35), the Cognacq-Jay foundation, the program ‘Investissements d’avenir’ ANR-10-IAIHU-06 and the patients’ association Connaitre les Syndromes Cérébelleux (CSC).

Publisher Copyright:
© The Author 2017. Published by Oxford University Press. All rights reserved.


  • Biomarker discovery
  • Data integration
  • Regularized Generalized Canonical Correlation Analysis
  • Spinocerebellar ataxia


Dive into the research topics of 'A strategy for multimodal data integration: Application to biomarkers identification in spinocerebellar ataxia'. Together they form a unique fingerprint.

Cite this