Comprehensive identification of insertions/deletions (indels) across the full size spectrum from second generation sequencing is challenging due to the relatively short read length inherent in the technology. Different indel calling methods exist but are limited in detection to specific sizes with varying accuracy and resolution. We present ScanIndel, an integrated framework for detecting indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, we demonstrate ScanIndel's superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing data from the human NIST standard NA12878. Thus, we anticipate ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use at https://github.com/cauyrd/ScanIndel.
|Original language||English (US)|
|State||Published - Dec 7 2015|
Bibliographical noteFunding Information:
We thank Matt Schomaker and Aaron Lambert who worked on producing the data on the Miseq. We thank Getiria Onsongo for helping to validate the indel detection pipeline. We also thank the Minnesota Supercomputing Institute for providing computing resources and infrastructure. This work was supported by funds from the Department of Laboratory Medicine and Pathology at the University of Minnesota. R.Y. is supported by a Young Investigator Award from the PCF.
© 2015 Yang et al.