Background: Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes.Results: We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness.Conclusions: This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population-genetic studies of O. taurus and possibly other horned beetles.
Bibliographical noteFunding Information:
We thank the Center for Genomics and Bioinformatics at Indiana University and its staff, especially Keithanne Mockaitis and John Colborne for their help and expertise in executing this study. Funding for this study was provided by National Science Foundation grants IOS 0820411 to JA and APM and IOS 4824311 to APM. ESR was supported by NIH NRSA F32GM083830. Additional funding for work in the Center for Genomics and Bioinformatics, was provided in part by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, as well as the National Research Foundation of Korea [NRF-2009-352-D00275] to HS.