TY - JOUR
T1 - COT
T2 - an efficient and accurate method for detecting marker genes among many subtypes
AU - Lu, Yingzhou
AU - Wu, Chiung Ting
AU - Parker, Sarah J.
AU - Cheng, Zuolin
AU - Saylor, Georgia
AU - Van Eyk, Jennifer E.
AU - Yu, Guoqiang
AU - Clarke, Robert
AU - Herrington, David M.
AU - Wang, Yue
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press.
PY - 2022
Y1 - 2022
N2 - Motivation: Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others - so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtypes. We and others have recognized that the test statistics used by most methods do not exactly satisfy the MG definition and often identify inaccurate MG. Results: We report an efficient and accurate data-driven method, formulated as a Cosine-based One-sample Test (COT) in scatter space, to detect MG among many subtypes using subtype expression profiles. Fundamentally different from existing approaches, the test statistic in COT precisely matches the mathematical definition of an ideal MG. We demonstrate the performance and utility of COT on both simulated and real gene expression and proteomics data. The open source Python/R tool will allow biologists to efficiently detect MG and perform a more comprehensive and unbiased molecular characterization of tissue or cell subtypes in many biomedical contexts. Nevertheless, COT complements not replaces existing methods.
AB - Motivation: Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others - so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtypes. We and others have recognized that the test statistics used by most methods do not exactly satisfy the MG definition and often identify inaccurate MG. Results: We report an efficient and accurate data-driven method, formulated as a Cosine-based One-sample Test (COT) in scatter space, to detect MG among many subtypes using subtype expression profiles. Fundamentally different from existing approaches, the test statistic in COT precisely matches the mathematical definition of an ideal MG. We demonstrate the performance and utility of COT on both simulated and real gene expression and proteomics data. The open source Python/R tool will allow biologists to efficiently detect MG and perform a more comprehensive and unbiased molecular characterization of tissue or cell subtypes in many biomedical contexts. Nevertheless, COT complements not replaces existing methods.
UR - http://www.scopus.com/inward/record.url?scp=85148558291&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85148558291&partnerID=8YFLogxK
U2 - 10.1093/bioadv/vbac037
DO - 10.1093/bioadv/vbac037
M3 - Article
C2 - 35673616
AN - SCOPUS:85148558291
SN - 2635-0041
VL - 2
JO - Bioinformatics Advances
JF - Bioinformatics Advances
IS - 1
M1 - vbac037
ER -