TY - JOUR
T1 - A Bayesian approach for accurate de novo transcriptome assembly
AU - Shi, Xu
AU - Wang, Xiao
AU - Neuwald, Andrew F.
AU - Halakivi-Clarke, Leena
AU - Clarke, Robert
AU - Xuan, Jianhua
N1 - Funding Information:
This work is supported by National Institutes of Health (NIH) (CA149653, CA164384, CA149147 and GM125878).
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - De novo transcriptome assembly from billions of RNA-seq reads is very challenging due to alternative splicing and various levels of expression, which often leads to incorrect, mis-assembled transcripts. BayesDenovo addresses this problem by using both a read-guided strategy to accurately reconstruct splicing graphs from the RNA-seq data and a Bayesian strategy to estimate, from these graphs, the probability of transcript expression without penalizing poorly expressed transcripts. Simulation and cell line benchmark studies demonstrate that BayesDenovo is very effective in reducing false positives and achieves much higher accuracy than other assemblers, especially for alternatively spliced genes and for highly or poorly expressed transcripts. Moreover, BayesDenovo is more robust on multiple replicates by assembling a larger portion of common transcripts. When applied to breast cancer data, BayesDenovo identifies phenotype-specific transcripts associated with breast cancer recurrence.
AB - De novo transcriptome assembly from billions of RNA-seq reads is very challenging due to alternative splicing and various levels of expression, which often leads to incorrect, mis-assembled transcripts. BayesDenovo addresses this problem by using both a read-guided strategy to accurately reconstruct splicing graphs from the RNA-seq data and a Bayesian strategy to estimate, from these graphs, the probability of transcript expression without penalizing poorly expressed transcripts. Simulation and cell line benchmark studies demonstrate that BayesDenovo is very effective in reducing false positives and achieves much higher accuracy than other assemblers, especially for alternatively spliced genes and for highly or poorly expressed transcripts. Moreover, BayesDenovo is more robust on multiple replicates by assembling a larger portion of common transcripts. When applied to breast cancer data, BayesDenovo identifies phenotype-specific transcripts associated with breast cancer recurrence.
UR - http://www.scopus.com/inward/record.url?scp=85114631914&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114631914&partnerID=8YFLogxK
U2 - 10.1038/s41598-021-97015-x
DO - 10.1038/s41598-021-97015-x
M3 - Article
C2 - 34480063
AN - SCOPUS:85114631914
SN - 2045-2322
VL - 11
JO - Scientific reports
JF - Scientific reports
IS - 1
M1 - 17663
ER -