High-throughput techniques are producing large-scale high-dimensional (e.g., 4D with genes vs timepoints vs conditions vs tissues) genome-wide gene expression data. This induces increasing demands for effective methods for partitioning the data into biologically relevant groups. Current clustering and co-clustering approaches have limitations, which may be very time consuming and work for only low-dimensional expression datasets. In this work, we introduce a new notion of "co-identification", which allows systematical identification of genes participating different functional groups under different conditions or different development stages. The key contribution of our work is to build a unified computational framework of co-identification that enables clustering to be high-dimensional and adaptive. Our framework is based upon a generic optimization model and a general optimization method termed Maximum Block Improvement. Testing results on yeast and Arabidopsis expression data are presented to demonstrate high efficiency of our approach and its effectiveness.
|Original language||English (US)|
|Title of host publication||Pattern Recognition in Bioinformatics - 7th IAPR International Conference, PRIB 2012, Proceedings|
|Number of pages||12|
|State||Published - 2012|
|Event||7th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB 2012 - Tokyo, Japan|
Duration: Nov 8 2012 → Nov 10 2012
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Other||7th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB 2012|
|Period||11/8/12 → 11/10/12|
Bibliographical noteFunding Information:
This research is supported by grants from NIH NCRR (5P20RR016460-11) and NIGMS (8P20GM103429-11).