Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies

Serguei V Pakhomov, B. T. McInnes, J. Lamba, Y. Liu, Genevieve B Melton-Meaux, Y. Ghodke, N. Bhise, V. Lamba, Angela K Birnbaum

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets " suggested" by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research.

Original languageEnglish (US)
Pages (from-to)862-869
Number of pages8
JournalJournal of Biomedical Informatics
Issue number5
StatePublished - Oct 2012

Bibliographical note

Funding Information:
The authors wish to thank Ted Pedersen for his input during the project. This work was supported by the Grant #R01LM009623-01 from the National Library of Medicine.


  • Gene-drug associations
  • Pathway-driven analysis
  • PharmGKB
  • Pharmacogenomics
  • Support vector machine
  • Text mining


Dive into the research topics of 'Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies'. Together they form a unique fingerprint.

Cite this