Pretraining Graph Neural Networks on Ultra Large Chemical Libraries to Learn Generalizable ADMET Predictors

August 10, 2021
Events, ACS2021

At the American Chemical Society Fall 2021 National Meeting & Expo, Atomwise members were selected to present their research and work. Learn what our Atoms have been working on below and visit Atomwise at ACS Fall 2021 National Meeting for other presentation sessions. 


HAHossam Ashtawy, PhD 

Atomwise Co-Authors: Brandon Anderson, Jon Sorenson, Izhar Wallach

Title: Pretraining Graph Neural Networks on Ultra Large Chemical Libraries to Learn Generalizable ADMET Predictors

Division: Computers in Chemistry

View Presentation




Graph Neural Networks (GNN) have demonstrated superior performance to conventional molecular representation approaches in ligand and structure-based drug discovery applications. The quality of the automatically generated representations from raw chemical structures depend on the architecture of the model and most importantly the quality of the training data. Typically, larger neural network models tend to generate more expressive representations. However, in data constrained settings, such as modeling expensive in vitro and in vivo ADMET endpoints and molecular properties, large GNN models can quickly overfit the training data with its potential biases and noise and as a result fail to generalize. To overcome these shortcomings, in this work we propose a pre-training strategy that initializes an embedding GNN model to encode molecules into generalizable latent representations. The model is pertained on compounds from a very large chemical space to jointly learn a large set of diverse and efficiently computable pre-training tasks including physicochemical properties and the presence of different functional groups. The pre-training tasks in the embedding GNN are then replaced by the target multi-task ADMET endpoints as well as molecular properties and the resulting model is fine-tuned with experimental data. We compare our approach to non-pretrained models and we show that model pre-training results in significant performance gains on out-of-distribution validation compounds for almost all ADMET tasks particularly those with fewer training data. We also show that the generated embeddings are effective in capturing similarity between compounds and in querying chemical libraries.


Join our team

Our team is comprised of over 40 PhD scientists who contribute to a high-performance academic-like culture that fosters robust scientific and technical excellence. We strongly believe that data wins over opinions, and aim for as little dogma as possible in our decision making. Learn more about our team and opportunities at Atomwise.