Missing value estimation for DNA microarray gene expression data: Local least squares imputation

Hyunsoo Kim, Gene H. Golub, Haesun Park

Research output: Contribution to journalArticlepeer-review

376 Scopus citations

Abstract

Motivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. Results: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data.

Original languageEnglish (US)
Pages (from-to)187-198
Number of pages12
JournalBioinformatics
Volume21
Issue number2
DOIs
StatePublished - Jan 15 2005

Bibliographical note

Funding Information:
The authors would like to thank the University of Minnesota Supercomputing Institute (MSI) for providing the computing facilities. We also thank Dr Shigeyuki Oba for providing datasets and helpful discussions. The work of H.P. has been performed while at the NSF and was partly supported by IR/D from the National Science Foundation (NSF). This material is based upon the work supported in part by the National Science Foundation Grants CCR-0204109 and ACI-0305543. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Fingerprint

Dive into the research topics of 'Missing value estimation for DNA microarray gene expression data: Local least squares imputation'. Together they form a unique fingerprint.

Cite this