Abstract
Advances in sequencing technology have allowed researchers to sequence DNA with greater ease and at decreasing costs. Main developments have focused on either sequencing many short sequences or fewer large sequences. Methods for sequencing mid-sized sequences of 600–5,000 bp are currently less efficient. For example, the PacBio Sequel I system yields ~ 100,000–300,000 reads with an accuracy per base pair of 90–99%. We sought to sequence several DNA populations of ~ 870 bp in length with a sequencing accuracy of 99% and to the greatest depth possible. We optimised a simple, robust method to concatenate genes of ~ 870 bp five times and then sequenced the resulting DNA of ~ 5,000 bp by PacBioSMRT long-read sequencing. Our method improved upon previously published concatenation attempts, leading to a greater sequencing depth, high-quality reads and limited sample preparation at little expense. We applied this efficient concatenation protocol to sequence nine DNA populations from a protein engineering study. The improved method is accompanied by a simple and user-friendly analysis pipeline, DeCatCounter, to sequence medium-length sequences efficiently at one-fifth of the cost.
Original language | English (US) |
---|---|
Article number | 18065 |
Journal | Scientific reports |
Volume | 11 |
Issue number | 1 |
DOIs | |
State | Published - Dec 2021 |
Bibliographical note
Funding Information:This work was funded in part by grants from the US National Institutes of Health (5R01GM108703-04, 7DP2GM123457-02), the Simons Foundation (340762, 290356), the National Aeronautics and Space Administration (80NSSC18K1277 and 80NSSC 21K0595), the Minnesota Medical Foundation (4036–9663-10), the University of Minnesota Biocatalysis Initiative, and the Office of the VP of Research at the University of Minnesota (Grant-in-Aid). Sequencing was carried out by the DNA Technologies and Expression Analysis Cores at the UC Davis Genome Center, supported by NIH Shared Instrumentation Grant 1S10OD010786-01. We thank Oanh Nguyen from UC Davis Genome Center for useful discussion and troubleshooting efforts and Benjamin Auch and Archana Despande from the University of Minnesota Genomic Center for useful discussions and support with project design and SMRT Link software v8.0. We thank S.E. Erickson for testing the DeCatCounter script and feedback on its documentation, and S.E. Erickson, S.P. Miller, and C.L. Tong for careful reading of the manuscript.
Publisher Copyright:
© 2021, The Author(s).