Big Data Cohort Extraction to Facilitate Machine Learning to Improve Statin Treatment

Chih Lin Chi, Jin Wang, Thomas R. Clancy, Jennifer G. Robinson, Peter J. Tonellato, Terrence J. Adam

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


Health care Big Data studies hold substantial promise for improving clinical practice. Among analytic tools, machine learning (ML) is an important approach that has been widely used by many industries for data-driven decision support. In Big Data, thousands of variables and millions of patient records are commonly encountered, but most data elements cannot be directly used to support decision making. Although many feature-selection tools can help identify relevant data, these tools are typically insufficient to determine a patient data cohort to support learning. Therefore, domain experts with nursing or clinic knowledge play critical roles in determining value criteria or the type of variables that should be included in the patient cohort to maximize project success. We demonstrate this process by extracting a patient cohort (37,506 individuals) to support our ML work (i.e., the production of a proactive strategy to prevent statin adverse events) from 130 million de-identified lives in the OptumLabs™ Data Warehouse.

Original languageEnglish (US)
Pages (from-to)42-62
Number of pages21
JournalWestern journal of nursing research
Issue number1
StatePublished - Jan 1 2017

Bibliographical note

Funding Information:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors thank support from AHC Seed Grant through University of Minnesota, NIH-1R01LM011566-01.

Publisher Copyright:
© 2016, © The Author(s) 2016.


  • Big Data
  • cohort extraction
  • machine learning
  • statin treatment
  • translational research


Dive into the research topics of 'Big Data Cohort Extraction to Facilitate Machine Learning to Improve Statin Treatment'. Together they form a unique fingerprint.

Cite this