NEW YORK – Investigators at Google DeepMind have developed a machine learning-based tool known as AlphaMissense for predicting the pathogenicity of missense variants in protein-coding genomic regions.
"What we are presenting today is our project called AlphaMissense," Pushmeet Kohli, VP of research in AI for science at Google DeepMind, said during a press briefing on Monday. "This is a project that relates to the latest advance we have made on the missense variant prediction problem."
For a paper appearing in Science on Tuesday, Kohli and his colleagues used the AlphaMissense tool to tally likely pathogenic outcomes across the suite of missense mutations that might arise in more than 19,200 canonical human protein-coding sequences.
The tool is "trained from protein sequences," first author Jun Cheng, a research scientist and team leader with Google DeepMind in London, explained at Monday’s briefing. “By training, it sees millions of protein sequences and learns what a regular protein sequence looks like.”
Cheng noted that the machine-learning model builds on Google DeepMind’s previous protein structure prediction tool, AlphaFold. In contrast to AlphaFold, though, AlphaMissense has been trained with missense variant data from humans, nonhuman primates, and other organisms.
Consequently, AlphaMissense "has the ability, the capability, to really learn from evolutionary constraints from related sequences," he said, calling it a protein language model.
"When it’s given a protein sequence with a mutation, it can tell us whether this looks bad or not," Cheng said, explaining that the model "assigns a score between zero and one to each of the variants, indicating how likely the variant is pathogenic."
"By pathogenic, we mean the variant is more likely to be associated with disease or to cause disease," he clarified.
At a precision threshold set to 90 percent, for example, the researchers successfully classified 89 percent of 71 million missense variants, including 32 percent they predicted to be likely pathogenic and 57 percent they called as likely benign. The remaining 11 percent of missense mutations were classified as uncertain.
The team emphasized that the vast majority of missense mutations they assessed in human proteins have not actually been found in nature, suggesting that many predicted pathogenic protein changes have been weeded out during the course of human evolution.
Indeed, although individuals carry thousands of missense variants each, on average, only around 6 percent of the mutations tested by AlphaMissense have been reported in humans in the past.
A far smaller fraction of those variants was previously classified as pathogenic or benign by experts, the authors noted, suggesting that machine learning and artificial intelligence may provide help in classifying variants of unknown significance as well as new variants found in humans in the future.
"The number of variants we know experimentally, or by human experts, is tiny," co-senior and co-corresponding author Žiga Avsec, a research scientist and team leader at Google DeepMind, told reporters on Monday.
"I hope that these predictions will give us an extra insight into how we are able to pinpoint which variants cause disease, as well as apply them to other applications in genomics," he added.
From their findings so far, the study’s authors suggested that AlphaMissense may be especially suited for helping human genetics researchers and, eventually, clinicians to distinguish between unclassified variants that do or do not contribute to disease, while potentially tracking down distinct genetic contributors in conditions with similar symptoms.
AlphaMissense-based predictions "may illuminate the molecular effects of variants on protein function, contribute to the identification of pathogenic missense mutations and previously unknown disease-causing genes, and increase the diagnostic yield of rare genetic diseases," they wrote.
Even so, the investigators cautioned that the tool is not intended to be used for clinical diagnoses on its own. Rather, AlphaMissense is intended to serve as a tool that complements existing strategies for classifying variants as pathogenic or likely pathogenic in the clinic — an approach they used during a collaboration with Genomics England to validate AlphaMissense predictions in previously profiled rare disease cases.
"This sort of acts as an extra filtering layer, where you may gain extra clarity on what some of the variants do, and that may help you potentially better narrow down the list of variants," Avsec said.
In a corresponding perspectives article in Science, University of Edinburgh researcher Joseph Marsh and Sarah Teichmann, an investigator at the University of Cambridge Wellcome Sanger Institute, noted that AlphaMissense's computational approach "will undoubtedly be helpful for variant interpretation and prioritization."
Even so, the duo warned that "it is important not to confuse these labels with the very specific clinical definitions of these terms, which rely on multiple lines of evidence."
In addition, they pointed to gaps that remain with the approach, noting that the proposed classification tool does not yet take more complicated protein patterns, such as protein complex formation or cell type-specific protein interactions, into consideration when predicting variant effects.
"Although no [variant effect predictors] can, as of now, be relied on alone for genetic diagnosis," Teichmann and Marsh wrote, "their utility in the diagnostic odyssey will continue to improve as both computational approaches and strategies for their interpretation advance."