Big data cohort extraction for personalized statin treatment and machine learning

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Scopus citations


The creation of big clinical data cohorts for machine learning and data analysis require a number of steps from the beginning to successful completion. Similar to data set preprocessing in other fields, there is an initial need to complete data quality evaluation; however, with large heterogeneous clinical data sets, it is important to standardize the data in order to facilitate dimensionality reduction. This is particularly important for clinical data sets including medications as a core data component due to the complexity of coded medication data. Data integration at the individual subject level is essential with medication-related machine learning applications since it can be difficult to accurately identify drug exposures, therapeutic effects, and adverse drug events without having high-quality data integration of insurance, medication, and medical data. Successful data integration and standardization efforts can substantially improve the ability to identify and replicate personalized treatment pathways to optimize drug therapy.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Number of pages18
StatePublished - 2019

Publication series

NameMethods in Molecular Biology
ISSN (Print)1064-3745


  • Clinical comorbidity evaluation
  • Clinical data integration
  • Medication safety
  • Personalized medication therapy

Fingerprint Dive into the research topics of 'Big data cohort extraction for personalized statin treatment and machine learning'. Together they form a unique fingerprint.

Cite this