The properties of high-dimensional data spaces: Implications for exploring gene and protein expression data

Robert Clarke, Habtom W. Ressom, Antai Wang, Jianhua Xuan, Minetta C. Liu, Edmund A. Gehan, Yue Wang

Research output: Contribution to journalReview articlepeer-review

427 Scopus citations


High-throughput genomic and proteomic technologies are widely used in cancer research to build better predictive models of diagnosis, prognosis and therapy, to identify and characterize key signalling networks and to find new targets for drug development. These technologies present investigators with the task of extracting meaningful statistical and biological information from high-dimensional data spaces, wherein each sample is defined by hundreds or thousands of measurements, usually concurrently obtained. The properties of high dimensionality are often poorly understood or overlooked in data modelling and analysis. From the perspective of translational science, this Review discusses the properties of high-dimensional data spaces that arise in genomic and proteomic studies and the challenges they can pose for data analysis and interpretation.

Original languageEnglish (US)
Pages (from-to)37-49
Number of pages13
JournalNature Reviews Cancer
Issue number1
StatePublished - Jan 2008
Externally publishedYes

Bibliographical note

Funding Information:
We wish to thank D. J. Miller (Department of Electrical Engineering, The Pennsylvania State University) for critical reading of the manuscript. Some of the issues we discuss may appear overly simplified to experts. Several of the emerging concepts have yet to appear in the biomedical literature and publications might not be accessible through PubMed (but are often found at an author’s or journal’s homepage or at CiteSeer). Many of the engineering and computer science works published in ‘proceedings’ represent peer-reviewed publications. This work was supported in part by Public Health Service grants R01-CA096483, U54-CA100970, R33-EB000830, R33-CA109872, 1P30-CA51008, R03-CA119313, and a U.S. Department of Defense Breast Cancer Research Program award BC030280.


Dive into the research topics of 'The properties of high-dimensional data spaces: Implications for exploring gene and protein expression data'. Together they form a unique fingerprint.

Cite this