Biclustering via sparse clustering

Erika S. Helgeson, Qian Liu, Guanhua Chen, Michael R. Kosorok, Eric Bair

Research output: Contribution to journalArticle

Abstract

In identifying subgroups of a heterogeneous disease or condition, it is often desirable to identify both the observations and the features which differ between subgroups. For instance, it may be that there is a subgroup of individuals with a certain disease who differ from the rest of the population based on the expression profile for only a subset of genes. Identifying the subgroup of patients and subset of genes could lead to better-targeted therapy. We can represent the subgroup of individuals and genes as a bicluster, a submatrix, (Formula presented.), of a larger data matrix, (Formula presented.), such that the features and observations in (Formula presented.) differ from those not contained in (Formula presented.). We present a novel two-step method, SC-Biclust, for identifying (Formula presented.). In the first step, the observations in the bicluster are identified to maximize the sum of the weighted between-cluster feature differences. In the second step, features in the bicluster are identified based on their contribution to the clustering of the observations. This versatile method can be used to identify biclusters that differ on the basis of feature means, feature variances, or more general differences. The bicluster identification accuracy of SC-Biclust is illustrated through several simulated studies. Application of SC-Biclust to pain research illustrates its ability to identify biologically meaningful subgroups.

Original languageEnglish (US)
JournalBiometrics
DOIs
StateAccepted/In press - Jan 1 2019

Fingerprint

Biclustering
Cluster Analysis
Genes
Clustering
Subgroup
Aptitude
genes
Gene
pain
Pain
therapeutics
Two-step Method
Subset
Large Data
methodology
Research
Population
Therapy
Maximise
Observation

Keywords

  • biclustering
  • hierarchical clustering
  • high-dimensional data
  • k-means clustering
  • sparse clustering

PubMed: MeSH publication types

  • Journal Article

Cite this

Helgeson, E. S., Liu, Q., Chen, G., Kosorok, M. R., & Bair, E. (Accepted/In press). Biclustering via sparse clustering. Biometrics. https://doi.org/10.1111/biom.13136

Biclustering via sparse clustering. / Helgeson, Erika S.; Liu, Qian; Chen, Guanhua; Kosorok, Michael R.; Bair, Eric.

In: Biometrics, 01.01.2019.

Research output: Contribution to journalArticle

Helgeson, Erika S. ; Liu, Qian ; Chen, Guanhua ; Kosorok, Michael R. ; Bair, Eric. / Biclustering via sparse clustering. In: Biometrics. 2019.
@article{373af5a585274c09b72dc49d968d5f11,
title = "Biclustering via sparse clustering",
abstract = "In identifying subgroups of a heterogeneous disease or condition, it is often desirable to identify both the observations and the features which differ between subgroups. For instance, it may be that there is a subgroup of individuals with a certain disease who differ from the rest of the population based on the expression profile for only a subset of genes. Identifying the subgroup of patients and subset of genes could lead to better-targeted therapy. We can represent the subgroup of individuals and genes as a bicluster, a submatrix, (Formula presented.), of a larger data matrix, (Formula presented.), such that the features and observations in (Formula presented.) differ from those not contained in (Formula presented.). We present a novel two-step method, SC-Biclust, for identifying (Formula presented.). In the first step, the observations in the bicluster are identified to maximize the sum of the weighted between-cluster feature differences. In the second step, features in the bicluster are identified based on their contribution to the clustering of the observations. This versatile method can be used to identify biclusters that differ on the basis of feature means, feature variances, or more general differences. The bicluster identification accuracy of SC-Biclust is illustrated through several simulated studies. Application of SC-Biclust to pain research illustrates its ability to identify biologically meaningful subgroups.",
keywords = "biclustering, hierarchical clustering, high-dimensional data, k-means clustering, sparse clustering",
author = "Helgeson, {Erika S.} and Qian Liu and Guanhua Chen and Kosorok, {Michael R.} and Eric Bair",
year = "2019",
month = "1",
day = "1",
doi = "10.1111/biom.13136",
language = "English (US)",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",

}

TY - JOUR

T1 - Biclustering via sparse clustering

AU - Helgeson, Erika S.

AU - Liu, Qian

AU - Chen, Guanhua

AU - Kosorok, Michael R.

AU - Bair, Eric

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In identifying subgroups of a heterogeneous disease or condition, it is often desirable to identify both the observations and the features which differ between subgroups. For instance, it may be that there is a subgroup of individuals with a certain disease who differ from the rest of the population based on the expression profile for only a subset of genes. Identifying the subgroup of patients and subset of genes could lead to better-targeted therapy. We can represent the subgroup of individuals and genes as a bicluster, a submatrix, (Formula presented.), of a larger data matrix, (Formula presented.), such that the features and observations in (Formula presented.) differ from those not contained in (Formula presented.). We present a novel two-step method, SC-Biclust, for identifying (Formula presented.). In the first step, the observations in the bicluster are identified to maximize the sum of the weighted between-cluster feature differences. In the second step, features in the bicluster are identified based on their contribution to the clustering of the observations. This versatile method can be used to identify biclusters that differ on the basis of feature means, feature variances, or more general differences. The bicluster identification accuracy of SC-Biclust is illustrated through several simulated studies. Application of SC-Biclust to pain research illustrates its ability to identify biologically meaningful subgroups.

AB - In identifying subgroups of a heterogeneous disease or condition, it is often desirable to identify both the observations and the features which differ between subgroups. For instance, it may be that there is a subgroup of individuals with a certain disease who differ from the rest of the population based on the expression profile for only a subset of genes. Identifying the subgroup of patients and subset of genes could lead to better-targeted therapy. We can represent the subgroup of individuals and genes as a bicluster, a submatrix, (Formula presented.), of a larger data matrix, (Formula presented.), such that the features and observations in (Formula presented.) differ from those not contained in (Formula presented.). We present a novel two-step method, SC-Biclust, for identifying (Formula presented.). In the first step, the observations in the bicluster are identified to maximize the sum of the weighted between-cluster feature differences. In the second step, features in the bicluster are identified based on their contribution to the clustering of the observations. This versatile method can be used to identify biclusters that differ on the basis of feature means, feature variances, or more general differences. The bicluster identification accuracy of SC-Biclust is illustrated through several simulated studies. Application of SC-Biclust to pain research illustrates its ability to identify biologically meaningful subgroups.

KW - biclustering

KW - hierarchical clustering

KW - high-dimensional data

KW - k-means clustering

KW - sparse clustering

UR - http://www.scopus.com/inward/record.url?scp=85074425862&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074425862&partnerID=8YFLogxK

U2 - 10.1111/biom.13136

DO - 10.1111/biom.13136

M3 - Article

C2 - 31424089

AN - SCOPUS:85074425862

JO - Biometrics

JF - Biometrics

SN - 0006-341X

ER -