NEW YORK – Two papers published this week in the American Journal of Human Genetics have demonstrated the feasibility of using recently developed machine-learning tools and computational modeling to incorporate insights from structural biology into the process of variant classification.
The studies appeared alongside a commentary piece authored by three scientists at Ambry Genetics discussing some of the nuances and caveats of this type of protein structure prediction, important lessons learned from the two new studies, and the company's own history of using structural biology internally.
In an interview, Ambry's Marcy Richardson said that the aim is for the new publications to be a "practical guide" that may spur others in the community to follow suit.
One of the new studies, which involved an international team led by Miguel de la Hoya, a pathologist at the molecular oncology laboratory, Hospital Clínico San Carlos in Madrid, explored the use of protein stability measures to improve classification of BRCA1 missense variants.
The other, led by investigators at the University of Queensland, Australia, used similar analyses to improve classification of TP53 missense variants.
Currently, variant classification, especially in large consortia like ENIGMA — of which both de la Hoya and Richardson are members — is largely focused on changes at the DNA and mRNA level.
Although some elements of structural biology have been incorporated into the ACMG/AMP guidelines since their inception in 2000, "the direct implementation of structural analysis using more complex properties, such as features derived from three-dimensional structures, is largely absent in guidelines despite the apparent benefit for variation interpretations," Richardson and coauthors wrote in their commentary.
Amanda Spurdle, a professor at Australia's QIMR Berghofer institute and the TP53 study's senior author, said that unlike companies like Ambry with dedicated structural biologists on staff, the academic community has lacked the expertise to easily incorporate protein structural features into their efforts to establish reliable classifications based on gene variant effects on their own.
However new user-friendly and web-based tools have recently opened the possibility for easier integration of structural features into the toolkit of variant classification and curation.
As part of ENIGMA's recent work, de la Hoya raised the question of trying to take advantage of these tools, including a web-based program called AlphaFold, which enables the prediction of protein structure from RNA sequencing data.
The result was the two studies, both of which were done in collaboration with Ambry.
"We got their expert advice and managed to come up with something that showed we could practically change the way we can be doing our variant curation," Spurdle said. "We haven't yet taken it [into the consortium] yet because the papers weren't published till this wee, and there probably is more work that has to be done … but we've laid a path forward," she added.
"In ENIGMA, we have tended to ignore all this biology, but I think it's important, not only in our group but for the ACMG classification system as a whole," said de la Hoya.
"I had been thinking, 'How is it possible that we are not taking advantage of all the structural biology knowledge that is there to help classify invariants?' That was really shocking for me," he added.
"I think of these two papers as taking a step to show that although it isn't easy, it is valuable, and it is possible."
In his team's BRCA1 study, de la Hoya tested the value of incorporating structure-based evidence, including relative solvent accessibility, folding stability, and pathogenicity predicted by a program called AlphaMissense, which infers variant effects from structure.
The team calculated likelihood ratios towards pathogenicity/benignity provided by a handful of different computational tools, both individually and combined, and performed a clinical validation of their findings using the large-scale BRIDGES case-control dataset.
Based on their results, they concluded that structure-based analysis improves upon other inputs in correctly predicting variant pathogenicity.
For TP53, Spurdle and her coauthors similarly found that protein stability measurement adds value for classification, in this case for a subset of missense and single amino acid deletion variants encoded by the TP53 gene.
"One thing we got extra from the TP53 paper is that we've been using a bioinformatic tool to predict the impact of in-frame indels, and they are just a nightmare," Spurdle said. "There tend not to be functional studies on them, and so we've tried to come up with all sorts of rules of thumb, but they don't always hold true." Incorporating spatial biology, she added, "basically showed that the tool we had been using completely and utterly overpredicted impact on function."
Another important takeaway, Richardson said, is that a gene-specific approach to calibrating these computational tools is a necessity. The two studies used slightly different approaches because the starting data available in terms of protein structure was different for the two different genes.
Performing the studies, de la Hoya said, he had to face the fact that it's not as simple to introduce structural biology into a variant classification system as he might have anticipated. "At the end of the day, this classification system is about the kind of evidence you introduce into the system. You have to be able to calibrate it and to produce a likelihood ratio," he said.
"Thanks to all the data we have now on most of these genes, I'd say that it's easier to calibrate these tools," de la Hoya added. "But of course, it's still tricky because we are calibrating against functional data, not against clinical data."
Ambry's Richardson said the take-home message is that gene variant researchers don't have to have an advanced degree in structural biology now. "Miguel and Mandy both followed a process that anyone else can do."
In the company's companion commentary piece, she and her coauthors have tried to provide a broader overview to help guide future efforts in the academic community, including hurdles to look out for, and a sense of best practices.
"To a certain group of people who find things like this sexy, AlphaFold is the new thing everybody's really excited about it, but not many people know how to use it or where to get started, and I think this is a great introductory starting-off point," Richardson said.
Spurdle said that her lab already has another study in the works, having been persuaded of the value of looking at protein structure to help predict variant effects.
"I think it's really important because you have a missense variant and that can be either stable or unstable. And you can imagine that the disease risk characteristics could be different because one [structure] is hanging around and the other one is disappearing," she said. "It's changed the way I'm doing my research."
De la Hoya noted that both his and Spurdle's studies were on loss-of-function variants, so the utility of these tools for studying gain of function could be an interesting next step.
In the meantime, Richardson said, it should be possible for the field to extrapolate and adapt Spurdle's and de la Hoya's methodologies to any other loss-of-function gene, "not just in cancer and not just in these two genes."