We present a machine learning framework to explore the predictability limits of catalytic activity from experimental descriptor data (which characterizes catalyst formulations and reaction conditions). Artificial neural networks are used to fuse descriptor data to predict activity and we use principal component analysis (PCA) and sparse PCA to project the experimental data into an information space and with this identify regions that exhibit low- and high-predictability. Our framework also incorporates a constrained-PCA optimization formulation that identifies new experimental points while filtering out regions in the experimental space due to constraints on technology, economics, and expert knowledge. This allows us to navigate the experimental space in a more targeted manner. Our framework is applied to a comprehensive water–gas shift reaction data set, which contains 2228 experimental data points collected from the literature. Neural network analysis reveals strong predictability of activity across reaction conditions (e.g., varying temperature) but also reveals important gaps in predictability across catalyst formulations (e.g., varying metal, support, and promoter). PCA analysis reveals that these gaps are due to the fact that most experiments reported in the literature lie within narrow regions in the information space. We demonstrate that our framework can systematically guide experiments and the selection of descriptors in order to improve predictability and identify new promising formulations.
Bibliographical noteFunding Information:
Andrea Keane was supported by the WARF 2020 program at the University of Wisconsin-Madison.
© 2019 Elsevier B.V.
- Data analysis
- Machine learning