Skip to main content
Premium Trial:

Request an Annual Quote

Invitae Updates Variant Classification Platform With Machine Learning Tool


This article was updated to clarify the percentages of variants classified as benign, pathogenic, and likely benign that remained stable over an eight-and-a-half year period.

NEW YORK – As Invitae seeks to regain its footing in the wake of its recent bankruptcy filing, the company is highlighting progress in the computational biology arena, particularly with respect to genetic variant interpretation.

As part of that effort, the San Francisco-based company on Tuesday introduced a new machine learning-based variant interpretation tool called Clinical Variant Modeling (CVM), which will be presented at this week's annual conference of the American College of Medical Genetics and Genomics in Toronto. 

"Essentially, what [CVM] does," said Michael Korn, Invitae's chief medical officer, "is look at the clinical data we have, [to get] an idea of what a patient with a particular disease looks like."

The method analyzes all annotated genes and gene variants for given conditions and, for each gene associated with a condition, builds a model of patients who would be expected to manifest those conditions, assuming that each variant was causal. This results in a patient score that represents the probability that a patient is affected with the condition of interest. That information is fed into a Bayesian inference model that calculates a variant score representing the probability that a given variant is pathogenic. Finally, a panel of clinical genomic experts reviews a subset of the variant classifications to ensure that CVM is performing as expected.

"With this," Korn said, "we are able to resolve a very significant number of variants of unknown significance."

Applying CVM to Invitae's database of over 4 million patients and over 2 million DNA variants, Korn said that the company was able to generate highly stable results, wherein over 99 percent of variants originally classified as benign and pathogenic remained unchanged over the 8.5 year study period.

Invitae presented the results of a functional study used to validate CVM at ACMG. In a second poster, the company presented a model for genes associated with Lynch syndrome, the most common cause of hereditary colorectal cancer.

In this study, Invitae scientists generated variants for three Lynch-related mismatch repair (MMR) genes: MSH2, MLH1, and PMS2. These were introduced into cells so that each cell contained a single variant copy. Single-cell gene expression profiles were built via RNA-seq, and applied machine learning to find gene expression patterns indicative of known pathogenic and benign variants. The functional impacts of the variants in these three genes were assessed using multiplex assays of variant effects (MAVE), which is a means of measuring the functional consequences of genetic variants. The resulting variant effect maps help streamline downstream clinical variant interpretation.

Simulations made from these data suggested that 64 variants in MSH2, 29 variants in MLH1, and 28 variants in PMS2 would receive updated clinical variant classifications.

The company estimates that these results may lead to some 18,000 patients tested by Invitae for Lynch syndrome receiving VUS reclassifications, resulting in an overall 24 percent reduction in VUS for these three genes.

CVM is the latest addition to Invitae's Generation variant interpretation platform, itself a consolidation of the firm's diverse variant interpretation engines. These include engines for predicting gene function, population frequency modeling, estimating molecular stability, and analyzing variants via evolutionary modeling.

"But all this is not possible without the human element," Korn said. "At each step, there are [subject matter] experts working on this."

Heidi Rehm, a human geneticist and genomic medicine researcher at the Broad Institute, commented that human oversight remains important for any machine learning-based variant interpretation engine, as certain key information types are structured in ways that leave them largely inaccessible to computer algorithms.

"Sometimes the most valuable information is a pedigree in a manuscript, [which] is not computable," she said. "I have to look at the figure and see if the variant segregated with disease, and in how many family members, and none of that data is accessible to a computer."

Nonetheless, Korn commented that Invitae has invested heavily in developing natural language processing systems to reduce the overall human workload of reviewing scientific papers.

Korn also said that the technology that Invitae gained through its acquisition of bioinformatics firm Jungla in 2019 underpins CVM and the Generation suite of tools generally.

"That [technology] has been integrated and was the starting point for all these developments," he said.

Given its clinical importance, variant interpretation comprises a competitive market space. Among the larger competitors are Myriad Genetics, which has focused largely on oncology-related variants, and Qiagen which has been actively applying its Qiagen Clinical Insight Interpret (QCI Interpret) software in a range of clinical decision support-oriented partnerships.

At the other end of the scale, smaller firms and startups such as Fabric Genomics and Nostos Genomics have also been making inroads.

Fabric Genomics has been active in the clinical decision support space, where its Enterprise platform aims to provide clinicians with a manageable set of gene variant information to help streamline their diagnostic decision-making process. Last year, the company teamed up with DNAnexus to develop an integrated workflow to improve turnaround time for rare disease and cancer diagnostics by interpreting genomic sequences, detecting clinically meaningful variants, and generating comprehensive patient reports.

German startup Nostos Genomics' Aion platform uses a proprietary classification algorithm to examine patterns of variation across genomic regions to identify regions that are more and less tolerant to variation. This, explained Chief Operating Officer Ansgar Lange, helps to avoid problems that might arise from simply determining whether or not a variant is present in databases such as GnomAD, which can lead to issues when analyzing samples for populations that are underrepresented in genetic variation databases.

"With this approach," Lange said via email, "we observed good clinical performance across patients with rare diseases from different ethnicities (based on self-reported ethnicity) as part of our clinical validation study with Genomics England. Nevertheless, the reality is that substantial efforts need to be made to expand reference population databases to increase sample sizes and include more samples from populations underrepresented in these datasets."

Despite filing for bankruptcy last month, Korn said, Invitae remains committed to continuing to develop high-quality diagnostics tools, such as integrating "top-level" technology into future variant interpretation offerings.

"We expect that Invitae will be in a much better position when we come through this," he said, "and there will be much more internal ability to move all these things forward."