Chimeric Molecules as Adversarial Training Examples for Machine Learning

April 02, 2021
Events, ACS2021

At the American Chemical Society Spring 2021 Virtual Meeting & Expo, several Atomwise members and partners were selected to present their research and work. Learn what our Atoms have been working on below and visit Atomwise at ACS Spring 2021 Virtual Meeting & Expo for other presentation sessions. 

Jon Sorenson, PhDJon Sorenson, PhD 

Atomwise Co-Author: Shabbir Suterwala, Izhar Wallach

Title: Chimeric Molecules as Adversarial Training Examples for Machine Learning

Division: CINF




The application of machine learning techniques to cheminformatics data has accelerated tremendously in recent years.  In particular, deep learning architectures show considerable promise in being able to extract patterns and features from large datasets that aren’t readily obvious to human experts or apparent with hand-crafted features.  With this larger modeling capacity comes greater potential for machine learning models to “cheat” and identify trivial hyperplanes separating class labels, rather than learning generalizable physical-based properties of the dataset.  An example of such a hyperplane would be where a model learns that a particular functional group, for example a terminal sulfonylamide, is always associated with binding affinity---regardless of where that functional group is placed on the molecule.  The use of such a model in a predictive setting leads to an overabundance of predictions favoring that functional group without regard to the molecular scaffold or the presence of other substituents.  We have devised a general scheme for combatting this form of bias.  Using a fragment-based genetic algorithm, we take any compound in our training set and form a set of scrambled compounds which contain the parts of this compound, plus a fraction of new fragments.  These compounds are used as decoys during model training to mitigate bias due to any one fragment.  We show here that the use of these decoys reduces undesirable functional group bias in model predictions using the AtomNet® structure-based architecture and proteochemometric models.  They also reduce the occurrence of favorite molecules that are predicted to bind strongly, regardless of the target protein.  The algorithmic approach to forming these chimeric molecules is general and can be readily adapted to condition models in other ligand-based modeling contexts such as the prediction of ADMET properties.


Slide Presentation

ACS Spring_Atomwise_Jon Sorenson


Join our team

Our team is comprised of over 40 PhD scientists who contribute to a high-performance academic-like culture that fosters robust scientific and technical excellence. We strongly believe that data wins over opinions, and aim for as little dogma as possible in our decision making. Learn more about our team and opportunities at Atomwise.