Indel detection from DNA and RNA sequencing data with transIndel

Rendong Yang, Jamie L. Van Etten, Scott M Dehm

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Background: Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Results: Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Conclusions: Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.

Original languageEnglish (US)
Article number270
JournalBMC Genomics
Volume19
Issue number1
DOIs
StatePublished - Apr 19 2018

Fingerprint

RNA Sequence Analysis
DNA Sequence Analysis
RNA
Prostatic Neoplasms
RNA Splicing
Benchmarking
Atlases
Genes
Disease Progression
Neoplasms
Genome
Pathology
DNA

Keywords

  • Cancer genome
  • DNA-seq
  • Exitron
  • Indel detection
  • Metastasis
  • RNA-seq
  • TCGA

PubMed: MeSH publication types

  • Journal Article

Cite this

Indel detection from DNA and RNA sequencing data with transIndel. / Yang, Rendong; Van Etten, Jamie L.; Dehm, Scott M.

In: BMC Genomics, Vol. 19, No. 1, 270, 19.04.2018.

Research output: Contribution to journalArticle

@article{14331fa56eb347178f4d9610dea48d77,
title = "Indel detection from DNA and RNA sequencing data with transIndel",
abstract = "Background: Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Results: Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Conclusions: Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.",
keywords = "Cancer genome, DNA-seq, Exitron, Indel detection, Metastasis, RNA-seq, TCGA",
author = "Rendong Yang and {Van Etten}, {Jamie L.} and Dehm, {Scott M}",
year = "2018",
month = "4",
day = "19",
doi = "10.1186/s12864-018-4671-4",
language = "English (US)",
volume = "19",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Indel detection from DNA and RNA sequencing data with transIndel

AU - Yang, Rendong

AU - Van Etten, Jamie L.

AU - Dehm, Scott M

PY - 2018/4/19

Y1 - 2018/4/19

N2 - Background: Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Results: Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Conclusions: Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.

AB - Background: Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Results: Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Conclusions: Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.

KW - Cancer genome

KW - DNA-seq

KW - Exitron

KW - Indel detection

KW - Metastasis

KW - RNA-seq

KW - TCGA

UR - http://www.scopus.com/inward/record.url?scp=85045571881&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045571881&partnerID=8YFLogxK

U2 - 10.1186/s12864-018-4671-4

DO - 10.1186/s12864-018-4671-4

M3 - Article

C2 - 29673323

AN - SCOPUS:85045571881

VL - 19

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 270

ER -