Abstract
The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets " suggested" by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research.
Original language | English (US) |
---|---|
Pages (from-to) | 862-869 |
Number of pages | 8 |
Journal | Journal of Biomedical Informatics |
Volume | 45 |
Issue number | 5 |
DOIs | |
State | Published - Oct 2012 |
Bibliographical note
Funding Information:The authors wish to thank Ted Pedersen for his input during the project. This work was supported by the Grant #R01LM009623-01 from the National Library of Medicine.
Keywords
- Gene-drug associations
- Pathway-driven analysis
- PharmGKB
- Pharmacogenomics
- Support vector machine
- Text mining