Statistical methods for tissue array images-algorithmic scoring and co-training

Donghui Yan, Pei Wang, Michael Linden, Beatrice Knudsen, Timothy Randolph

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm-Tissue Array Co-Occurrence Matrix Analysis (TACOMA)-for quantifying cellular phenotypes based on textural regularity summarized by local inter-pixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists' input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With co-training, the error rate of TACOMA can be reduced substantially for a very small training sample (e.g., with size 30). We give theoretical insights into the success of co-training via thinning of the feature set in a high-dimensional setting when there is "sufficient" redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists' performance in terms of accuracy and repeatability.

Original languageEnglish (US)
Pages (from-to)1280-1305
Number of pages26
JournalAnnals of Applied Statistics
Issue number3
StatePublished - Sep 2012


  • Classification
  • Co-training
  • High-dimensional inference
  • Ratio of separation


Dive into the research topics of 'Statistical methods for tissue array images-algorithmic scoring and co-training'. Together they form a unique fingerprint.

Cite this