NEW YORK (GenomeWeb) – Researchers have recalibrated existing in silico algorithms based on functional data for hundreds of BRCA1 and BRCA2 missense variants that they believe can improve quantitative variant classification approaches.
They also used these optimized in silico models to develop metapredictors, and applied them to more than 30,000 possible variants in these two genes to assess whether they are likely to be pathogenic or neutral. This list of predicted classifications that the team published in a paper last week in Genetics in Medicine may be a resource for genetic testing labs trying to prioritize which variants of uncertain significance (VUS) need further workup.
In the paper, researchers led by Fergus Couch and Steven Hart from the Mayo Clinic described an approach for predicting the pathogenicity of BRCA1 and BRCA2 variants using functional evaluation and in silico analysis. They hope the methodology they've demonstrated in the paper will help expert bodies, such as the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA), improve existing quantitative variant classification models that incorporate in silico methods.
Couch, a leading researcher of breast cancer genes, receives calls and emails from genetic counselors, physicians, and patients from all over the world hoping he can provide a little more insight about a VUS to help them lean one way or another about whether the variant is pathogenic or benign. "People are just lost and they don't know what to do when they have these VUS," Couch said. "They're all just looking for that extra bit of information to help them make a decision about how to manage risk."
Most BRCA1 and BRCA2 genetic variants known to cause breast and ovarian cancer shorten the coding sequence of the protein. However, it has been challenging for the field to determine how missense variants — single base pair changes that produce different amino acids than what's usually produced at those positions — impact protein function.
Thousands of missense BRCA1/2 variants have been identified by clinical genetic testing, but the majority are classified as VUS in public databases. "We have thousands of these genetic variants that clinicians and patients don't know what to do with," said Couch, estimating that there are only 22 missense variants in BRCA1 and 15 missense variants in BRCA2 in public databases that have been classified as deleterious.
Traditionally, researchers have tracked the impact of genetic variants within families to determine whether they are disease causing or neutral. However, for the numerous missense variants in BRCA1/2 identified to date, there is often not enough family data to definitively establish whether or not they are associated with a heightened risk of cancer.
In an effort to address this problem, ENIGMA, a consortium of experts focused on classifying variants in BRCA1/2 and other breast cancer genes, developed a multifactorial quantitative model for predicting whether a variant could be pathogenic or neutral based on how it segregated with cancer in families, the history of cancer in these families, the pathology of tumors, whether the VUS co-occurred with other pathogenic variants in individuals, and the sequence-based in silico prediction model Align-GVGD.
To date, ENIGMA experts have classified more than 6,100 BRCA1/2 variants out of around 20,000 variants amassed in the BRCA Exchange portal. However, Heidi Rehm, chief genomics officer at Massachusetts General Hospital's Department of Medicine, pointed out that ENIGMA's multi-factorial likelihood ratio has been challenging for individual labs to apply because they haven't amassed multiple lines of evidence on thousands of variants in the way the expert body has been able to via collaborations with researchers around the world.
"Where this model has been challenging is when there is incremental new published data about a particular mutation in a family," Rehm said. "How does a clinical lab write a clinical report or a variant that has some published case reports but has never been tackled by ENIGMA?"
And even with ENIGMA's multifactorial model, Couch noted that thousands of BRCA1/2 VUS have yet to be classified due to the dearth of family data. "We need about 10 families with the same VUS, with multiple individuals for each family tested for this VUS, in order to have enough statistical power to classify [it] as pathogenic or neutral," he said.
In the absence of family data, researchers have turned to functional assays to understand how a variant might impact the protein, but these are time and resource intensive to perform. In Couch's lab, for example, researchers will run a functional assay for a BRCA2 variant nine different times. "We don’t want to make a mistake," he said. "If I make a mistake in my functional assay and I publish the wrong result, a patient might have prophylactic surgery on that basis."
Improving in silico models
Given these challenges with regard to classifying missense variants, Couch and his colleagues have been working on improving the accuracy of existing in silico models so they can be more useful within quantitative variant classification models for BRCA1/2. As described in Genetics in Medicine, researchers used assays previously shown to have 100 percent specificity and sensitivity for pathogenic missense variants in the BRCT domain of BRCA1 and in the DNA binding domain of BRCA2, and determined the functional status of 248 BRCA1 and 207 BRCA2 missense variants.
Even though an expert body like ENIGMA may not agree with the classification of these variants based purely on functional data, Couch's team accepted these determinations as a gold standard for the purposes of their research and used them to test the predictive performance of commonly used in silico models. Some of the models were in line with previous cut offs published in the literature, while others weren't, and based on their predictive performance, researchers recalibrated the cut-offs for these models.
"We've had these in silico predictors for a long time, and everyone in the business is aware of them, but nobody has any clue if they're any good or not," Couch said. Even though Align-GVGD is an in silico method that ENIGMA uses in its multi-factorial quantitative model, "we've never really known how good or bad it really was, because the number of known deleterious and known neutral variants were quite small for both genes," he noted.
In the paper, Couch's team reported which of the optimized in silico methods performed best for BRCA1 and BRCA2 variants. For example, they noted that Align-GVGD worked well for predicting the function of BRCA1 variants but was not among the best-performing models for BRCA2 variants. Based on this finding, Couch will urge experts attending ENIGMA's annual meeting this week in Edinburgh, Scotland, to replace Align-GVGD in its quantitative variant interpretation framework for BRCA2, with another method that performed better. (Couch is a member of ENIGMA.)
Couch's team also combined these optimized in silico models into metapredictors using two different statistical methods, random forest, and the naïve voting model (NVM). "Overall, the predictive abilities of the random forest and NVM models showed substantial improvement over individual in silico prediction methods using default parameters, and modest improvements over the best-performing individual in silico methods optimized at thresholds specific to BRCA1 and BRCA2," the researchers wrote in the paper.
The researchers applied these new metapredictors to every possible alteration that could occur in BRCA1 and BRCA2, and published the classification predictions for more than 30,000 variants. "There is around 10 percent error in that," noted Couch, "and we haven't done a prospective study to determine this is all correct."
Couch's group also focused on variants in specific domains of BRCA1 and BRCA2, and so it's not clear how well these models will perform for variants in other regions of these genes. And when researchers applied the BRCA1 and BRCA2 metapredictors to variants in other genes, they didn't work very well, suggesting that these optimized models are not generalizable.
"We're not claiming this is a clinical result, and we're not claiming it's perfect," Couch said. "But it's pretty good, and it'll help a lot of people make choices as to which way they should go."
'Useful contribution'
The work led by Hart and Couch "is a useful contribution in that it provides an improved method for assessing missense mutations, which are some of the hardest to interpret," agreed Robert Nussbaum, chief medical officer at genetic testing firm Invitae.
Invitae has its own variant interpretation method, called Sherloc, which employs a quantitative system for assessing the pathogenicity of variants. However, Nussbaum acknowledged that the company pays "very little attention" to current in silico methods right now because of the limitations that Couch's team pointed out in their paper. "The output of Hart et al.'s sequence-based computational prediction models fits well into a system that uses a quantitative approach to assessing pathogenicity," he said.
Couch also sees the published list of predicted classifications as a way to prioritize which variants in BRCA1 and BRCA2 need further investigation. "The goal is to have a shortcut to better understanding what a variant is," he said. "Then, we can set the variants aside that need additional workup with family and functional data."
He and his colleagues are starting to look into the variants predicted to be deleterious based on these metapredictors. "It's going to take a long time to get through these things," he said. But now, when a patient comes in with a missense VUS in a BRCA1 or BRCA2, and there is a lack of family, pathology, or functional data, "at the very least we have something to start with," Couch noted.
This work could also be useful to ClinGen — an NIH-funded effort to build an authoritative central resource on the clinical relevance of genes and variants — which has a Sequence Variant Interpretation Working Group focused on refining the American College of Medical Genetics and Genomics' guidelines with quantitative approaches.
The current ACMG variant interpretation guidelines, which are widely used by clinical labs, note that in silico models should be applied carefully and not be the sole source of evidence used to make clinical assertions about a variant. The group recommends further that evidence from in silico predictors should only be considered supporting evidence on the pathogenicity or neutrality of a variant if all the models used agree on the classification.
This guidance is somewhat vague, said Rehm, who is last author on the ACMG guidelines and a principal investigator of ClinGen, because there are so many in silico models out there. "Some work the exact same way, and … some work better for some genes versus others," she said. "The ideal thing is to pick out a predictor that is well validated for specific genes."
Expert panels within ClinGen focused on classifying variants in genes linked to specific diseases are working on finding the best in silico predictor for that gene of interest. According to Rehm, within ACMG, experts are also trying to find a single metapredictor that could be applicable across genes and provide some information when there aren't gene-specific in silico models.
Reflecting on the work going on within ClinGen, ACMG, and ENIGMA (which is an expert group within ClinGen), Couch noted that all these expert groups are moving toward similar goals.
"They're heading toward us, but we're also heading toward them," said Couch. "We're trying to reach back and look at the ACMG [variant interpretation] guidelines and see how we can adapt our rules to better fit their model, because clinicians around the world are very familiar with that ACMG model."