Mining sequence data is increasingly important for biocatalysis research. However, when relying on sequence data alone, prediction of the reaction catalyzed by a specific protein sequence is often elusive, and substrate specificity is far from trivial. The present study demonstrated an approach of combining sequence data and structures from distant homologs to target identification of new nitrilases that specifically utilize hindered nitrile substrates like mandelonitrile. A total of 212 non-identical target nitrilases were identified from GenBank. Evolutionary trace and sequence clustering methods were used combinatorily to identify a set of nitrilases with presumably distinct substrate specificities. Selected encoding genes were cloned into Escherichia coli. Recombinant E. coli expressing NitA (gi91784632) from Burkholderia xenovorans LB400 was capable of growth on glutaronitrile or adiponitrile as the sole nitrogen source. Purified NitA exhibited highest activity with mandelonitrile, showing a catalytic efficiency (kcat/Km) of 3.6 × 104 M-1 s-1. A second nitrilase predicted from our studies from Bradyrhizobium zaponicum USDA 110 (gi27381513) was likewise shown to prefer mandelonitrile [Zhu, D., Mukherjee, C., Biehl, E.R., Hua, L., 2007. Discovery of a mandelonitrile hydrolase from Bradyrhizobium japonicum USDA110 by rational genome mining. J. Biotechnol. 129 (4), 645-650]. Thus, predictions from sequence analysis and distant superfamily structures yielded enzyme activities with high selectivity for mandelonitrile. These data suggest that similar data mining techniques can be used to identify other substrate-specific enzymes from published, unannotated sequences.
- Genome mining