Behind the AI: How We Use Benchmarks to Ensure Reliable Performance of a Model

August 16, 2021
AI Technology, AtomNet

When you’re developing deep learning models, it’s not always obvious why a model performs the way it does. After all, the whole point of deep learning is to let the network teach itself from training data — and the way it assimilates that information is not always what we would expect. So is it truly learning from the data, or simply memorizing it? If it’s making an association between two groups of data, is it an association that matters to us?

For the Atomwise team, understanding how networks are learning is essential to ensuring that our AtomNet® model for structure-based drug design is as accurate as possible for predicting which proteins and compounds will bind to each other. Models built on retrospective data may not always perform well for prospective scenarios.

Pawel Gniewek, PhDAt a recent American Chemical Society event, Atomwise Senior Scientist Pawel Gniewek gave a presentation about his team’s latest efforts to develop performance benchmarks to evaluate how well our models handle certain tasks. “It can be hard to judge if good performance means the model was trained well,” Gniewek says. “Models are very good at exploiting data and may pick up associations that are not important.”

He and his colleagues have been developing a series of benchmarks to help determine whether a model is actually functioning as intended. “It’s very hard to control for this since we can’t manually look at every single data point and rule out any bias,” he says. With the benchmarks, the team can run tests to look more holistically at performance and gauge how well the model made use of its training data.

For Atomwise’s purpose, the best-performing models use their training data to learn about physics, since it’s the structure, shape, and interplay of molecules that matter for binding. Gniewek used multi-task learning — that is, asking the model to predict the answers to multiple only slightly related problems at once — to help the model make the most of the data it had been given. By asking about binding predictions as well as whether a molecular pose looked possible, the model had to work harder and learn more about the physics underlying each trait. This approach helped to identify models that reflect the physics of the problem, rather than using spurious associations that would not generalize and fail in prospective analyses.

In addition, Gniewek provided the model with what he calls “pose negatives,” or poses that do not fit in order to augment the model with negative data points. “When we do that, we push the model to learn what we think is not physically plausible,” he says. “If we can guide the model about what to focus on, it will learn faster.” It also increases the amount of data we can use to train models, which pays dividends in neural networks. Ultimately, Gniewek hopes this kind of approach will lead to generalizable models that will enable us to target binding sites that were previously considered inaccessible.

While this work is interesting on its own to advance the AI field, it’s far more than just theoretical for our team. “At Atomwise, we are going after proteins for which there is little to no data — targets that were previously thought to be undruggable,” Gniewek says. “The reason we care so much about developing the best structure-based model is so that we can apply it to a protein for which we do not have the data and have it help us find a compound that can bind. This is a renaissance of structure-based design using deep learning.”


Learn More

Pawel Gniewek, PhD, was selected to present on this topic at the American Chemical Society Spring 2021 Virtual Meeting & Expo. Take a deeper dive by viewing his presentation deck - Structure-Based Drug Design with Multi-Task Learning and Data Augmentation 


About Atomwise

Atomwise is a preclinical pharma company revolutionizing how drugs are discovered with AI. We invented the use of deep learning for structure-based drug discovery, today developing a pipeline of small-molecule drug candidates advancing into preclinical studies. Our AtomNet® technology has been used to unlock more undruggable targets than any other AI drug discovery platform. We are tackling over 600 unique disease targets with more than 250 partners around the world, including leading pharmaceutical, agrochemical, and emerging biotechnology companies. Atomwise has raised over $174 million from leading venture capital firms to advance our mission to make better medicines, faster.

Contact us