A Bayesian approach for accurate de novo transcriptome assembly

Xu Shi, Xiao Wang, Andrew F. Neuwald, Leena Halakivi-Clarke, Robert Clarke, Jianhua Xuan

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


De novo transcriptome assembly from billions of RNA-seq reads is very challenging due to alternative splicing and various levels of expression, which often leads to incorrect, mis-assembled transcripts. BayesDenovo addresses this problem by using both a read-guided strategy to accurately reconstruct splicing graphs from the RNA-seq data and a Bayesian strategy to estimate, from these graphs, the probability of transcript expression without penalizing poorly expressed transcripts. Simulation and cell line benchmark studies demonstrate that BayesDenovo is very effective in reducing false positives and achieves much higher accuracy than other assemblers, especially for alternatively spliced genes and for highly or poorly expressed transcripts. Moreover, BayesDenovo is more robust on multiple replicates by assembling a larger portion of common transcripts. When applied to breast cancer data, BayesDenovo identifies phenotype-specific transcripts associated with breast cancer recurrence.

Original languageEnglish (US)
Article number17663
JournalScientific reports
Issue number1
StatePublished - Dec 2021

Bibliographical note

Funding Information:
This work is supported by National Institutes of Health (NIH) (CA149653, CA164384, CA149147 and GM125878).

Publisher Copyright:
© 2021, The Author(s).


Dive into the research topics of 'A Bayesian approach for accurate de novo transcriptome assembly'. Together they form a unique fingerprint.

Cite this