Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data

Baolin Wu, Tom Abbott, David Fishman, Walter McMurray, Gil Mor, Kathryn Stone, David Ward, Kenneth Williams, Hongyu Zhao

Research output: Contribution to journalArticlepeer-review

379 Scopus citations


Motivation: Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data. Results: We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.

Original languageEnglish (US)
Pages (from-to)1636-1643
Number of pages8
Issue number13
StatePublished - Sep 1 2003

Bibliographical note

Funding Information:
We thank reviewers for their constructive comments. This research is supported by NIH grant NHLBI N01-HV-28186, R01 GM59507, RR015837, NCI-EDRN, NCI U01 CA-98-028, and DOE grant DE-FG02-02ER63462.


Dive into the research topics of 'Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data'. Together they form a unique fingerprint.

Cite this