Local sparse bump hunting

Jean Eudes Dazard, J. Sunil Rao

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

The search for structures in real datasets, for example, in the form of bumps, components, classes, or clusters, is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without prespecifying their total number. A number of related methods already exist, yet are challenged in the context of high-dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≤case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a treebased method, a dimension reduction technique, and the Patient Rule InductionMethod (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer microarray dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online.

Original languageEnglish (US)
Pages (from-to)900-929
Number of pages30
JournalJournal of Computational and Graphical Statistics
Volume19
Issue number4
DOIs
StatePublished - Dec 2010
Externally publishedYes

Bibliographical note

Funding Information:
The authors are grateful to two anonymous referees, the associate editor, and the editor for valuable comments and suggestions. This research was conducted in part while J.-E. Dazard was a postdoctoral fellow in the Division of Biostatistics, mentored by J. Sunil Rao under NIH grant R25-CA04186. J. Sunil Rao was partially supported by NSF grant DMS-0405072 and by NIH grant K25-CA89867. Additional support came from grants of the Case Comprehensive Cancer Center (NIH-National Cancer Institute P30-CA043703) and the Clinical and Translational Science Award (NIH-National Center for Research Resources UL1-RR024989).

Keywords

  • Classification
  • Clustering
  • Density estimation
  • Mode/class discovery
  • Patient rule induction method
  • Sparse principal components

Fingerprint

Dive into the research topics of 'Local sparse bump hunting'. Together they form a unique fingerprint.

Cite this