Skip to main content
Premium Trial:

Request an Annual Quote

As Genetic Variant Reanalysis Needs Grow, Researchers Turn to Automation


NEW YORK — Researchers are increasingly turning to automated approaches to help tackle the increasing number of genetic variants that have to undergo reanalysis.

As more is learned about links between various gene variants and phenotypes, those variants' classifications as pathogenic, likely pathogenic, likely benign, benign, or uncertain can change and, in turn, alter and inform clinical diagnoses. Because of this, clinicians or labs need to regularly review previously identified genetic variants.

"Literally, we keep finding … hundreds of these genes. So, if your patient wasn't diagnosed yesterday, the gene that causes the disease could be published tomorrow," said Gill Bejerano, a professor of developmental biology, computer science, and pediatrics at Stanford University.

But there are a number of challenges to conducting such reanalyses, many of which come down to time. It takes time to sift through and read papers describing new gene- or variant-phenotype connections, and it takes time to go through the pool of patient data that needs to undergo reanalysis.

"With every single day, there is an increasing number of patients without a diagnosis," Joo Wook Ahn, lead bioinformatician for NHS East Genomics, said. "So, if you had to go back and try and look at them or reanalyze them, that's an ever-increasing amount of work."

Because of that, researchers like Bejerano, Ahn, and others are trying to automate the reanalysis process as much as they can to make it more feasible to run again and again. "Without any automation, that's a huge amount of time," Ahn added.

These approaches tend to rely on tools that integrate established databases of genomic variation like ClinVar and in-house databases, though other approaches rely on separate tools to scan the literature for relevant papers and flag variant reclassifications that might be relevant for a certain set of patients.

For instance, Heidi Rehm, the chief genomics officer at Massachusetts General Hospital, and her colleagues developed a software suite called GeneInsight to support clinical laboratory genetic and genomic testing. Part of the aim of the suite — which has since been purchased by Sunquest and renamed Mitogen — is to alert physicians or genetic counselors when the knowledgebase is updated with new information about patients' variants through the part of the software suite for that aim called GeneInsight Clinic.

Similarly, Ahn and his colleagues developed a bioinformatic approach dubbed TierUp. It relies on the National Health Service's Genomic Medicine Service bioinformatics infrastructure and uses the Genomics England Clinical Interpretation Portal-API to access case details while also drawing on crowdsourced, curated gene-disease association data from the PanelApp database to identify variants with new evidence linking them to disease.

Bejerano's lab at Stanford, meanwhile, developed a system dubbed AMELIE, or Automatic Mendelian Literature Evaluation, that analyzes the scientific literature and matches findings to patients with suspected Mendelian conditions. The AMELIE knowledgebase is automatically developed and updated based on articles about Mendelian diseases identified in PubMed. Its classifier then estimates whether an article may contain a diagnosis for a particular patient.

In studies, these tools appear to both identify variants with reclassifications and streamline the reanalysis process.

As Rehm and her colleagues reported recently in Genetics in Medicine, they implemented GeneInsight as part of the Electronic Medical Records and Genomics (eMERGE) phase III program. Its reanalysis of 1,855 variants led to the reclassification of 45, or 2 percent, of variants, which affected 67 participants. Most of these reclassifications were upgrades in pathogenicity — from a variant of uncertain significance to likely pathogenic or pathogenic — about which clinicians were then notified.

"They would get real-time notification on changes quicker than if we had just relied on the manual process," she said. The average reclassification timeframe was about 22 months.

Ahn and his colleagues applied their TierUp approach to nearly 950 individuals with an undiagnosed rare disease from the 100,000 Genomes Project. TierUp, as they reported in Genetics in Medicine in December, identified 410 variants in new disease genes and flagged them for further consideration in slightly more than an hour.

The results were passed along to the clinical rare disease testing laboratory for review and led to updated clinical reports for five cases, including two variants that were reclassified to be pathogenic and one that was reclassified as likely pathogenic.

"Our slant on [reanalysis] was to maximize the specificity," Ahn noted. "We wanted fewer notifications coming from the algorithm. But most of the time, if there is a notification, it is a true positive."

Additionally, Bejerano and his colleagues tested their AMELIE approach by applying it to a cohort of 110 patients whose causative genes were published between January 2012 and May 2018. For this analysis, they relied on a version of AMELIE that only had access to article data from 2011 or earlier that they then updated with the newer papers. This approach identified about 80 percent of the known reanalysis diagnoses, most of them soon after the first diagnostic article appeared, as the researchers reported in a preprint posted to MedRxiv. This, the researchers noted, translated to about one alert per patient per year.

These studies hint that automated reanalysis tools could be implemented in different settings. GeneInsight was used as part of the eMERGE study at various clinical sites that were supported by the MGH/Broad sequencing center, though Rehm noted that as a research effort, it wasn't fully integrated into their electronic health records there. TierUp has also been used by another large sequencing center in the UK, Ahn noted.

Even with new tools aimed at making variant reanalysis more automated, there are still stumbling blocks to their widespread use. Pengfei Liu, an assistant professor of molecular and human genetics at Baylor College of Medicine, who also developed a semi-automated reanalysis process, said that implementing reanalysis pipelines can be difficult.

More established diagnostic labs might be more reluctant to take on new processes like automated reanalysis pipelines because it may be tricky to fit them into their established workflow, especially as they have more and more data to sift through, Liu said. This, he added, will take both time and money to accomplish. "And there always has to be a painful decision of whether you want to adapt to the existing pipelines and your existing database, or you want to switch to a newer one so that you are more future-proof but you need more work to harmonize your historical data," he added.

There are also limited incentives to conducting reanalyses. Some labs have prioritized analyzing new cases over the reanalysis of older ones, and lab-initiated reanalysis is not always reimbursed. And so, Liu noted that some solutions may need to be creative, like having patients or providers subscribe to a reanalysis service. Another option, Ahn said, is to offer one or a few free reanalyses, though his lab does not.

Liu pointed out, though, that reanalysis tends to boost diagnostic yields, which labs could use to tout their services over their competitors'. He and his colleagues reported in the New England Journal of Medicine, for instance, that reanalysis could increase the molecular diagnostic rate of cases from 24.8 percent to 46.8 percent in a cohort of 250 individuals. Another study in Frontiers in Neurology from a different group found reanalysis could boost diagnostic yield among individuals with neurodevelopmental disorders by slightly more than 5 percent of patients.

"If [labs] don't do it well, their clients are going to leave and go to a lab that is going to do it well," Rehm added. "It's a service; it's a business function, but it's also really critically important to the patients' healthcare."

How often such reanalysis should be performed is still an unanswered question. Ahn noted that patients likely want to know any updated information as soon as they can, as new data can affect their care. He added, though, that how often and when reanalyses are performed would also depend on when databases like PanelApp or ClinVar are updated. Rehm noted there is a multi-pronged approach for improving interpretations, including American College of Medical Genetics and Genomics guidelines for classifying variants and depositing those interpretations into ClinVar, and expert panels to try to resolve differences in interpretation.

But that timing and other questions could be worked out as automated reanalysis approaches are more widely adopted.

"Let's build the system, the infrastructure, so every single patient that comes in the door can take advantage of it," Ahn said.