Structure-Based Drug Design with Multi-Task Learning and Data Augmentation

April 02, 2021
Events, ACS2021

At the American Chemical Society Spring 2021 Virtual Meeting & Expo, several Atomwise members and partners were selected to present their research and work. Learn what our Atoms have been working on below and visit Atomwise at ACS Spring 2021 Virtual Meeting & Expo for other presentation sessions. 

Pawel Gniewek, PhDPawel Gniewek, PhD 

Atomwise Co-Authors: Bastiaan Bergman, Bradley Worley

Title: Structure-Based Drug Design with Multi-Task Learning and Data Augmentation

Division: COMP




With rapid advances in machine learning methods and the availability of vast amounts of chemical data, structure-based drug design is at the dawn of a golden age. The tremendous successes of deep learning methods in the fields of natural language processing, speech recognition and computer vision have set the expectation for these emerging technologies to successfully target undruggable proteins and novel sites of well established pharmaceutical targets. In recent years, the scientific community reported excellent performance of deep-learning methods in various benchmarks for virtual high-throughput screening, QSAR, and ADMET tasks. Nevertheless follow up work often reveals that many of these methods fail to prospectively deliver the performance initially reported on retrospective benchmarks. These failures suggest that the described approaches are not generalizing as well as expected, and are instead overfitting to the training set or just cheating, i.e. finding exploits in the training and testing data sets that secure the supreme performance but with little practical value (also known as “Clever Hans” solutions in the machine learning community). In this work, we present a battery of benchmarks meant to detect and flag models that, despite their excellent retrospective performance, are likely to poorly perform when applied prospectively. Once pathological properties of these models are identified, we show how they can be systematically corrected through a combination of  data-augmentation and multi-task learning. We use a data augmentation technique called “pose-negatives” -- where poor poses are used as negative data-points -- and multi-task learning that biases models towards physically plausible ones. The methods proposed in this work are general, work for both grid and graph-based convolutional neural network models and, when paired with the presented battery of benchmarks, set new community standards for the robustness of the models in prospective discovery campaigns.


Slide Presentation

ACS Spring 2021_Pawel Gniewek


Join our team

Our team is comprised of over 80 PhD scientists who contribute to a high-performance academic-like culture that fosters robust scientific and technical excellence. We strongly believe that data wins over opinions, and aim for as little dogma as possible in our decision making. Learn more about our team and opportunities at Atomwise.