Machine learning models for lung cancer classification using array comparative genomic hybridization.

C. F. Aliferis, D. Hardin, P. P. Massion

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Array CGH is a recently introduced technology that measures changes in the gene copy number of hundreds of genes in a single experiment. The primary goal of this study was to develop machine learning models that classify non-small Lung Cancers according to histopathology types and to compare several machine learning methods in this learning task. DNA from tumors of 37 patients (21 squamous carcinomas, and 16 adenocarcinomas) were extracted and hybridized onto a 452 BAC clone array. The following algorithms were used: KNN, Decision Tree Induction, Support Vector Machines and Feed-Forward Neural Networks. Performance was measured via leave-one-out classification accuracy. The best multi-gene model found had a leave-one-out accuracy of 89.2%. Decision Trees performed poorer than the other methods in this learning task and dataset. We conclude that gene copy numbers as measured by array CGH are, collectively, an excellent indicator of histological subtype. Several interesting research directions are discussed.

Original languageEnglish (US)
Pages (from-to)7-11
Number of pages5
JournalProceedings / AMIA ... Annual Symposium. AMIA Symposium
StatePublished - 2002

Fingerprint

Dive into the research topics of 'Machine learning models for lung cancer classification using array comparative genomic hybridization.'. Together they form a unique fingerprint.

Cite this