Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Serguei V. Pakhomov, James Buntrock, Christopher G. Chute

    Research output: Contribution to journalArticlepeer-review

    45 Scopus citations


    This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.

    Original languageEnglish (US)
    Pages (from-to)145-153
    Number of pages9
    JournalJournal of Biomedical Informatics
    Issue number2
    StatePublished - Apr 2005


    • Automatic classification
    • Congestive heart failure
    • Machine learning
    • Medical informatics
    • Natural language processing
    • Naïve Bayes
    • Perceptron


    Dive into the research topics of 'Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier'. Together they form a unique fingerprint.

    Cite this