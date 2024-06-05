NEW YORK – The latest version of DeepMind's AlphaFold program offers improved prediction of protein-protein interactions as well as of binding for a wide variety of biomolecules.

Detailed in a paper in Nature last month, the model, called AlphaFold 3 (AF3), builds on the capabilities of previous AlphaFold models, providing a roughly 10 percent improvement in accuracy for predictions of general protein-protein interactions and roughly double the accuracy for predictions of protein-antibody binding, said Max Jaderberg, chief AI officer at Alphabet subsidiary Isomorphic Labs and one of the senior authors on the study.

AF3 also expands beyond protein-protein interactions to enable predictions of a variety of biomolecular structures including protein-nucleic acid complexes, protein-small molecule complexes, and protein post-translational modifications.

Jaderberg noted that previously, researchers typically had to use separate software tools to model different kinds of complexes. The data presented in the Nature study indicates that AF3 is able to predict the structure of multiple types of complexes with higher accuracy than most of these individual tools.

Beyond improved performance, the ability to predict different types of complexes using a single model streamlines scientists' workflows, Jaderberg said, citing Isomorphic's internal research as an example.

"What we see from our chemists and biologists is that as soon as you have multiple tools, they have different interfaces, different inputs, different data structures, different output formats," he said. "So there's a lot of plumbing work that needs to be done when you have different tools and you [want to] put them together."

Launched in 2021 by Demis Hassabis, the CEO and founder of DeepMind, Isomorphic Labs works with DeepMind to develop and apply computational techniques for drug development and medical research. The company is using tools like AF3 and what Jaderberg called the "frontier versions of our models" for internal drug discovery programs as well as in partnerships with pharma firms including Eli Lilly and Novartis.

Key to AF3's improved capabilities is the addition of a diffusion model framework, Jaderberg said, which is the same machine learning approach used in image generative models like Midjourney.

"Instead of operating in pixel space [as with Midjourney], this [AF3] diffusion model operates in atom-coordinate space," he said. "The model starts with a noisy cloud of atoms that make up your complex structure, and over time, the model refines the coordinates of this noisy cloud of atoms and morphs it into the actual structure that it predicts."

Also important was the use of a broader set of training data, moving beyond the protein data used to train the previous AlphaFold models to datasets that include complexes between proteins and RNA, DNA, small molecules, and other elements.

Jaderberg said that this more diverse training dataset allowed AF3 to move beyond protein complex prediction but might also have contributed to its improved performance for protein-protein interactions.

"Expanding from just modeling proteins to including other molecule types adds more data, but it also allows our models to start drawing inferences between how different atoms can be related to each other," he said.

Incorporating these broader datasets into the model's training was tricky, though, Jaderberg said.

"How do we expand a model that just deals with amino acids to also deal with nucleic acids and also deal with small molecules, which actually don't even have that sort of defined grammar you have for amino acids and nucleic acids?" he said. "Just working out how we take these different molecular types, represent them in a way that the neural network can understand the inputs, incorporate them, and process them together … is quite a big problem and was some of the crux of the AlphaFold 3 research."

While much of Isomorphic Labs work is focused on drug discovery, academic researchers outside the company have used the AF models for a variety of research pursuits, including protein-protein interaction work.

Last year, for instance, a team led by Juri Rappsilber, a professor of proteomics at the University of Edinburgh and a professor of bioanalytics at the Berlin Institute of Technology, published a study on combining cross-linking mass spectrometry and co-fractionation mass spec with the AlphaFold-Multimer software — an extension of AF2 intended for PPI research — to predict and validate PPIs in Bacillus subtilis.

Also last year, researchers at the SciLifeLab at Stockholm University and the European Bioinformatics Institute used a newly developed pipeline for AF2-based protein-protein interaction (PPI) prediction, called FoldDock, to predict structures for 65,484 human PPIs, generating 3,137 high-confidence PPI models.

A common use of AF within PPI research has been for screening candidate protein interactors against proteins of interest, similar to how researchers might use an immune-pulldown mass spectrometry experiment. Using AF in this way can help them narrow down potential interactions for more in-depth follow-up. Because the AF results also provide models of the interaction, researchers can more easily design point mutations to disrupt the putative interaction and then look for any biological effect.

Rappsilber said the AF3 model appears to be a "major step forward" not only in capability but also in terms of access, noting that the tool's web-based interface makes its use "essentially as easy as doing a web search."

"Any student or professor can just use AlphaFold3," he said. "There is no need for advanced computer literacy. No installation needed, no complicated formatting of uploaded information, an easy interface. One could imagine using it in teaching an undergraduate class."

Rappsilber said, however, that AF3's code not currently being open places certain limitations on its use by him and other researchers. For instance, his lab created open re-implementations of AF2 that let the researchers incorporate experimental data into the model's predictions, which can both improve them and reduce the amount of computing power required to make them. In a Nature Biotechnology paper published last year, for example, Rappsilber and colleagues showed that incorporating cross-linking mass spec data allowed them to generate structures using AF2 that the model was not able to predict on its own.

Given the fact that AF3 allows modeling of complexes with non-protein components for which there is limited training data, he said he would expect the inclusion of experimental data to boost the model's performance even more than it did for previous AF models.

"The code of AF3 should be made openly available and with it also the training data," he said. "This would allow us to add experimental data functionality."

Regarding access to AF3's code and training data, Jaderberg said the Nature paper has "all of the methods including the pseudocode for the models," adding that the developers will be "sharing the AlphaFold 3 weights and inference code for academic use over the next six months."