TY - JOUR
T1 - Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces
AU - Broccatelli, Fabio
AU - Trager, Richard
AU - Reutlinger, Michael
AU - Karypis, George
AU - Li, Mufei
N1 - Publisher Copyright:
© 2022 Wiley-VCH GmbH
PY - 2022/8
Y1 - 2022/8
N2 - In this work, we benchmark a variety of single- and multi-task graph neural network (GNN) models against lower-bar and higher-bar traditional machine learning approaches employing human engineered molecular features. We consider four GNN variants – Graph Convolutional Network (GCN), Graph Attention Network (GAT), Message Passing Neural Network (MPNN), and Attentive Fingerprint (AttentiveFP). So far deep learning models have been primarily benchmarked using lower-bar traditional models solely based on fingerprints, while more realistic benchmarks employing fingerprints, whole-molecule descriptors and predictions from other related endpoints (e. g., LogD7.4) appear to be scarce for industrial ADME datasets. In addition to time-split test sets based on Genentech data, this study benefits from the availability of measurements from an external chemical space (Roche data). We identify GAT as a promising approach to implementing deep learning models. While all the deep learning models significantly outperform lower-bar benchmark traditional models solely based on fingerprints, only GATs seem to offer a small but consistent improvement over higher-bar benchmark traditional models. Finally, the accuracy of in vitro assays from different laboratories predicting the same experimental endpoints appears to be comparable with the accuracy of GAT single-task models, suggesting that most of the observed error from the models is a function of the experimental error propagation.
AB - In this work, we benchmark a variety of single- and multi-task graph neural network (GNN) models against lower-bar and higher-bar traditional machine learning approaches employing human engineered molecular features. We consider four GNN variants – Graph Convolutional Network (GCN), Graph Attention Network (GAT), Message Passing Neural Network (MPNN), and Attentive Fingerprint (AttentiveFP). So far deep learning models have been primarily benchmarked using lower-bar traditional models solely based on fingerprints, while more realistic benchmarks employing fingerprints, whole-molecule descriptors and predictions from other related endpoints (e. g., LogD7.4) appear to be scarce for industrial ADME datasets. In addition to time-split test sets based on Genentech data, this study benefits from the availability of measurements from an external chemical space (Roche data). We identify GAT as a promising approach to implementing deep learning models. While all the deep learning models significantly outperform lower-bar benchmark traditional models solely based on fingerprints, only GATs seem to offer a small but consistent improvement over higher-bar benchmark traditional models. Finally, the accuracy of in vitro assays from different laboratories predicting the same experimental endpoints appears to be comparable with the accuracy of GAT single-task models, suggesting that most of the observed error from the models is a function of the experimental error propagation.
KW - ADME
KW - deep learning
KW - graph neural network
KW - in vitro assays
KW - multi-task learning
UR - http://www.scopus.com/inward/record.url?scp=85125072817&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125072817&partnerID=8YFLogxK
U2 - 10.1002/minf.202100321
DO - 10.1002/minf.202100321
M3 - Article
C2 - 35156325
AN - SCOPUS:85125072817
SN - 1868-1743
VL - 41
JO - Molecular Informatics
JF - Molecular Informatics
IS - 8
M1 - 2100321
ER -