Traditionally, materials discovery has been guided by basic physical rules, and such rules embody the basic understanding of the physical characteristics of interest of the material. However, the discovery of physical rules remains a challenging task due to the inherent difficulty in recognizing patterns in the high-dimensional and highly nonuniform distributed materials space. The standard data analytics approach using machine learning (ML) may fall short in producing meaningful results due to fundamental differences between the underlying assumptions and goals of ML vs materials discovery. ML is mainly focused on estimating complex black-box predictive models (that are nonlinear and multivariate), whereas in materials discovery, the goal is to come up with interpretable data-driven physical rules. Here, we attempt to tackle this problem by proposing a robust data analytics framework that allows us to derive basic physical rules from data. We introduce the concept of global and local modeling, utilizing both supervised and unsupervised learning, for highly nonuniformly distributed materials data. To enhance the model interpretation, we also introduce a model-independent interpretation technique to assist human experts in extracting useful physical rules. The proposed framework for extracting data-derived physical rules at the global and local level is illustrated using two case studies: (1) classification of van der Waals (vdW) and non-vdW (nvdW) materials and (2) classification of wide bandgap and non-wide bandgap vdW materials.
Bibliographical noteFunding Information:
This paper was supported by the National Science Foundation (NSF) under Contract No. DMREF-1921629.
© 2022 American Physical Society.