TY - JOUR
T1 - A machine learning approach identifies cellular senescence on transcriptome data of human cells in vitro
AU - Mahmud, Shamsed
AU - Zheng, Chen
AU - Santiago, Fernando
AU - Zhang, Lei
AU - Robbins, Paul D.
AU - Dong, Xiao
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to American Aging Association 2024.
PY - 2024
Y1 - 2024
N2 - Although cellular senescence has been recognized as a hallmark of aging, it is challenging to detect senescence cells (SnCs) due to their high level of heterogeneity at the molecular level. Machine learning (ML) is likely an ideal approach to address this challenge because of its ability to recognize complex patterns that cannot be characterized by one or a few features, from high-dimensional data. To test this, we evaluated the performance of four ML algorithms including support vector machines (SVM), random forest (RF), decision tree (DT), and Soft Independent Modelling of Class Analogy (SIMCA), in distinguishing SnCs from controls based on bulk RNA sequencing data. The dataset includes 162 in vitro samples, covering three human cell types: fibroblasts, melanocytes, and keratinocytes, and three senescence inducers: irradiation, bleomycin treatment, and replication. Under tenfold and leave-one-out cross-validation, as well as independent dataset validation, all methods provided ~ 80% or higher accuracy, with SVM reaching over 99%. Similar accuracy was achieved using expert-curated gene lists, e.g., SenMayo and CellAge, instead of our algorithm-prioritized gene list using minimum redundancy-maximum relevance (mRMR). However, only a few genes overlapped between the gene sets, suggesting a wide impact of senescence on the transcriptome. Overall, our study demonstrated a proof-of-concept for identifying senescence using ML.
AB - Although cellular senescence has been recognized as a hallmark of aging, it is challenging to detect senescence cells (SnCs) due to their high level of heterogeneity at the molecular level. Machine learning (ML) is likely an ideal approach to address this challenge because of its ability to recognize complex patterns that cannot be characterized by one or a few features, from high-dimensional data. To test this, we evaluated the performance of four ML algorithms including support vector machines (SVM), random forest (RF), decision tree (DT), and Soft Independent Modelling of Class Analogy (SIMCA), in distinguishing SnCs from controls based on bulk RNA sequencing data. The dataset includes 162 in vitro samples, covering three human cell types: fibroblasts, melanocytes, and keratinocytes, and three senescence inducers: irradiation, bleomycin treatment, and replication. Under tenfold and leave-one-out cross-validation, as well as independent dataset validation, all methods provided ~ 80% or higher accuracy, with SVM reaching over 99%. Similar accuracy was achieved using expert-curated gene lists, e.g., SenMayo and CellAge, instead of our algorithm-prioritized gene list using minimum redundancy-maximum relevance (mRMR). However, only a few genes overlapped between the gene sets, suggesting a wide impact of senescence on the transcriptome. Overall, our study demonstrated a proof-of-concept for identifying senescence using ML.
KW - Cellular senescence
KW - Machine learning
KW - RNA sequencing
UR - http://www.scopus.com/inward/record.url?scp=85213702821&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85213702821&partnerID=8YFLogxK
U2 - 10.1007/s11357-024-01485-6
DO - 10.1007/s11357-024-01485-6
M3 - Article
C2 - 39738795
AN - SCOPUS:85213702821
SN - 2509-2715
JO - GeroScience
JF - GeroScience
ER -