Theory-guided data science: A new paradigm for scientific discovery from data

Anuj Karpatne, Gowtham Atluri, James Faghmous, Michael Steinbach, Arindam Banerjee, Auroop Ganguly, Shashi Shekhar, Nagiza Samatova, Vipin Kumar

Research output: Contribution to journalArticlepeer-review

279 Scopus citations


Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.

Original languageEnglish (US)
Article number7959606
Pages (from-to)2318-2331
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number10
StatePublished - Oct 1 2017

Bibliographical note

Publisher Copyright:
© 1989-2012 IEEE.


  • Atmospheric modeling
  • Biological system modeling
  • Data models
  • Data science
  • domain knowledge
  • interpretability
  • Knowledge discovery
  • knowledge discovery
  • Mathematical model
  • Numerical models
  • physical consistency
  • scientific theory


Dive into the research topics of 'Theory-guided data science: A new paradigm for scientific discovery from data'. Together they form a unique fingerprint.

Cite this