Abstract
Nowadays, chemometric applications in biology can readily deal with tens of thousands of variables, for instance, in omics and environmental analysis. Other areas of chemometrics also deal with distilling relevant information in highly information-rich data sets. Traditional tools such as the principal component analysis or hierarchical clustering are often not optimal for providing succinct and accurate information from high rank data sets. A relatively little known approach that has shown significant potential in other areas of research is coclustering, where a data matrix is simultaneously clustered in its rows and columns (objects and variables usually). Coclustering is the tool of choice when only a subset of variables is related to a specific grouping among objects. Hence, coclustering allows a select number of objects to share a particular behavior on a select number of variables. In this paper, we describe the basics of coclustering and use three different example data sets to show the advantages and shortcomings of coclustering.
Original language | English (US) |
---|---|
Pages (from-to) | 256-263 |
Number of pages | 8 |
Journal | Journal of Chemometrics |
Volume | 26 |
Issue number | 6 |
DOIs | |
State | Published - Jun 1 2012 |
Keywords
- Clustering
- Coclustering
- L1 norm
- Sparsity