NEW YORK (GenomeWeb) – As diagnostic laboratories sequence more patients at risk for hereditary cancers, the demand for tools to conduct large data searches and aggregate multiple lines of evidence to classify variants' pathogenicity based on current guidelines is increasing as well. To help labs speed up that process, researchers from Memorial Sloan Kettering Cancer Center have developed Pathogenicity of Mutation Analysis, or PathoMAN, a freely available automated method for determining variant pathogenicity according to the guidelines published by the American College of Medical Genetics and Genomics (ACMG).
In a recent Genetics in Medicine paper describing the method, MSK researchers reported results from comparing PathoMAN's automated variant classifications to those reported by three large commercial diagnostic laboratories — Ambry Genetics, Invitae, and GeneDx. Among other results reported in the paper, PathoMAN achieved 94 percent concordance for pathogenic variants and over 81 percent concordance for benign variants. Overall, the researchers reported negligible discordance of PathoMAN's calls with manually curated variants, some loss of resolution, and some gain of resolution. This research is funded by MSK's Niehaus Center for Inherited Cancer Genomics.
As explained by its developers, PathoMAN automates the curation of germline genomic variants gleaned from clinical sequencing using the ACMG/AMP guidelines. It aggregates multiple tracks of genomic, protein, and disease-specific information from public sources that contain the evidence necessary for determining pathogenicity. This includes information on population allele frequencies and genomic annotations. The compiled data is used in 28 different categories that are part of the ACMG framework. Researchers use an aggregate score gleaned from evaluating these categories to classify variants as pathogenic, likely pathogenic, benign, likely benign, or of uncertain significance.
When the MSK team began developing PathoMAN, there were no good tools available for germline variant interpretation, according to Vijai Joseph, an associate attending geneticist at MSK and one of the authors on the paper. Prior to developing the solution, Joseph and his colleagues were part of a National Institutes of Health project focused on analyzing data from breast cancer patients and they were having trouble manually interpreting the large number of variants identified as part of the project. "We were on conference calls week after week trying to interpret [these variants] and we were not actually getting very far," he said. Manually interpreting these variants is both time and energy intensive and at best, the team could interpret just a handful of variants per week. Also, as data repositories grow and more people get tested, manually interpreting variants identified by these tests will be less and less feasible.
Although PathoMAN's framework is inspired by the ACMG guidelines, the tool also considers additional information such as gene class, somatic hotspots for cancer, and population-specific non-cancer controls from the Broad's Genome Aggregation Database (GnomAD) when making determinations of variant pathogenicity. PathoMAN's reports include over 100 fields of evidence and annotations for genomic variants, including details on the ACMG criteria used to make the classification.
PathoMAN's developers believe that their tool could simplify the variant classification process for molecular geneticists, genetic counselors, and variant curators. To showcase its accuracy, they used the pipeline to reevaluate manually curated germline variants from three commercial clinical testing laboratories. For the comparison, the researchers looked at commonly tested cancer susceptibility genes in multiples panels, many of which are in the ACMG's recommended gene list for secondary findings. They tested PathoMAN on more than 3,500 variants in 27 genes drawn from the labs' submissions to ClinVar They also tested the algorithm on 300 pathogenic and likely pathogenic variants in 55 genes gleaned from four published studies on non-ACMG cancer risk genes.
The researchers reported 84 percent agreement in pathogenicity calls across the three labs, and 90 percent of PathoMAN's predictions agreed with these. Furthermore, PathoMAN reclassified over 5 percent of variants previously categorized as pathogenic or likely pathogenic and just under 18 percent of variants classified as benign or likely benign as variants of unknown significance. Of those numbers, 53 percent of the pathogenic/likely pathogenic variants and 74 percent of the benign/likely benign variants had been previously classified as VUS in ClinVar by at least one submitter. The tool also reclassified roughly two percent of VUS as pathogenic/likely pathogenic and roughly four percent of VUS as benign/likely benign.
At least two of the three labs whose calls were tested as part of the study now have proprietary automated pipelines that they use in their variant interpretation processes. Ambry Genetics, for example, has an automated framework for germline variant assessment that can be expanded to assess somatic variants, described in a PLOS One paper last September. "In the new era of genomic sequencing, the ability to provide highly accurate and scalable variant assessment is critical and this paper is an important step in the right direction," Tina Pesaran, senior manager for Ambry's variant assessment program, said in an email. "Many labs, including Ambry, have recognized this and have prioritized this." For labs without their own automated pipelines for variant curation, PathoMAN "could be a valuable tool," she said.
Similarly, Keith Nykamp, senior scientist at Invitae and head of the company's variant interpretation team, noted that Invitae has automated some aspects of its internal variant interpretation processes. The lab’s solution, called Sherloc, uses a similar approach to the one adopted by PathoMAN to extract relevant information for determining variant pathogenicity from databases, according to Nykamp. Automation "can really be hugely beneficial for improving consistency and efficiency of the variant curation step," he said. While the ACMG framework provides useful guidelines for determining pathogenicity, a major challenge for labs is extracting needed data from repositories like GnomAD and the Exome Aggregation Consortium (ExAC). "PathoMAN will certainly help some of the smaller clinical labs and genetic counselors and non-experts to consistently grab the data that they need to be evaluating variants," Nykamp said.
The lab representatives noted that the presence of discordant calls across their labs reflect the need for more collaborative variant interpretation efforts to reach consensus. In 2017, as part of a ClinGen-led initiative, four clinical laboratories including Ambry and GeneDx came together to try to resolve differences in variant interpretations previously submitted to ClinVar. The labs worked to identify the basis of interpretation differences and to investigate if data sharing and reassessment could resolve those differences, using a subset of variants submitted by at least two of the four participating labs. The labs documented the basis for the discordance, shared internal data, independently reassessed the variants under the ACMG-AMP guidelines, and compared the updated interpretations, publishing details of that study in Genetics in Medicine in 2017.
"We certainly applaud all efforts to automate germline [variant] classification because it is a very complex process," said Rachel Klein, managing director of GeneDx's MyGeneTeam. While GeneDx is working on automating certain aspects of its variant classification procedures, Klein noted that there is still a need for human expertise to resolve more complex cases. "Even within the [PathoMAN] paper, you can see that there are some differences in terms of classification and that's really where you need the expertise of the scientists and specialists in variant classification to help resolve that," she said.
Kathleen Hruska, co-director of GeneDx's hereditary cancer program, expressed similar sentiments in her comments about the MSK pipeline. "[They've] made a very informed first pass at attempting to apply the ACMG and AMP criteria in an automated fashion and they recognize some of the limitations of this approach," Hruska said. Commercial diagnostic labs sometimes use private information to make classification calls and Hruska noted that this could account for discrepant calls between the labs. "[The MSKCC researchers] worked very hard to look at some of the evidence for variants which were discrepant in ClinVar or between the labs [but] it can be a challenge to know what additional evidence may have been applied, much less make them machine learnable," she said.
All of the lab representatives interviewed expressed interest in working with other labs to reach consensus on the discordant variants highlighted in the paper. "Let's have a conversation and determine what information or what access to certain data or what related factors you might have to make a call a different way, and hopefully have some classification resolution that ultimately benefits the patient," Klein said.
Nykamp added that although Invitae continues to find ways to automate its variant identification process, the lab will still rely on experts to help with curation and understanding the role that genes play in disease. For one thing, human experts could help clear up discordant calls between labs. As noted in the PathoMAN paper, "there are still 15 percent of variants [where] the inter-lab concordance is still uncertain," probably due to conflicting data, he said. "That’s where our scientists and human experts' judgment really comes into play."
For their next steps, PathoMAN's developers plan to incorporate functionality for PathoMAN to read literature, Joseph said. His team is consulting with natural language processing and machine-learning experts to incorporate a module that lets PathoMAN parse and extract information from the scientific and medical literature. In addition, the group plans to work on improving PathoMAN's ability to predict splice variants. "We have a lot of RNAseq data coming out from programs such as GTeX [and] we want to integrate RNAseq data together with the variation data to see if we can figure out a pattern for the splicing aspect [to make] our predictions much more accurate," he said. These updates should help the MSK team categorize the balance of the variants that PathoMAN was unable to call.
The group is also open to working with academic and commercial labs that may be interested in automating their variant curation efforts. "We can't throw enough people at an expanding number of panels … there just aren't enough people to do that for thousands of samples," Joseph said, "which is why we need computational help."
Lab representatives also shared their thoughts on the kinds of information and functionality that they would like to see in PathoMAN. "There is a lot of granularity to the ACMG rules that are not captured in the current version of PathoMAN," Pesaran said. For example, for PVS1, Ambry's approach accounts "for the location of the alteration, whether the variant is subject to nonsense-mediated decay, alternate naturally occurring isoforms, and the weight of the code is modified as a result," she explained. "This would be something important to incorporate in a later version."
Meanwhile, other members of the academic cancer community have released their own solutions for automating the variant pathogenicity classification process. For example, researchers from the McDonnell Genome Institute at Washington University School of Medicine in St. Louis last year released the Characterization of Germline Variants, or CharGer, software tool. As its name implies, the free software is used for interpreting and predicting clinical variants' pathogenicity using the ACMG guidelines.