NEW YORK (GenomeWeb) – The oncology research and clinical communities now have an automated way to review and clean up the output of their somatic variant calling pipelines thanks to new software developed by a Washington University School of Medicine, St. Louis (WUSTL)-led team.
While current next-generation sequence analysis pipelines successfully filter out many false variant calls that result from sequencing errors, misaligned reads, and so on, a manual inspection of the variant lists is often performed to catch any false positives still lurking in the data — a critical step, since inaccurately identifying and classifying variants can result in missed treatment opportunities and other adverse effects.
Given that somatic variants lists can include thousands of calls, reviewing each individual call is both time- and labor-intensive. The developers of the new Deep Somatic Variant Refinement (DeepSVR) software are betting that their solution can take some of that pain away. Full details of the method are provided in a paper that was published this week in Nature Genetics.
This paper is the first peer-reviewed publication to offer a computational solution that addresses issues with manually reviewing somatic variants. "Manual review remains the gold standard for evaluating and validating variants that are called from different sequencing pipelines," said coauthor Obi Griffith, an assistant professor of medicine and assistant director of the McDonnell Genome Institute at WUSTL. Typically, trained experts look at the data and evaluate the variant calls based on their knowledge and expertise. This is true whether the analysis is performed as part of research into novel cancer drivers or in the context of a diagnostic laboratory seeking to provide treatment guidance for an oncology patient.
But exactly how the data is reviewed is somewhat opaque. To date, there is little by way of published literature describing the nuances of manual review processes. Many research publications mention that investigators manually reviewed variant caller output but provide little to no detail about the specific postprocessing protocols and procedures that were used. With no real clarity around how other labs are performing manual review, it can make reproducing their findings challenging.
Moreover, there is variability in individual reviewers' approaches and due to normal attrition in attention that occurs over time, according to DeepSVR's developers. "People trust human eyes and effort to make a final determination but at the same time … don't really want to admit how subjective that can be. It raises some questions about how reliable that process is," Griffith said.
Attempting to start a conversation around standardizing the manual review process, some of researchers involved in the Nature Genetics study last month published a separate paper in Genetics in Medicine where they detailed the standard operating procedures used in Griffith's lab at WUSTL to manually review somatic SNVs and indels in sequence data from paired tumor and normal samples. By sharing their approach, the researchers hoped to improve reviewer accuracy and reduce inter-reviewer variability in the variant clean-up process.
In one sense, the SOP publication grew out of the group's efforts to apply machine learning techniques to the variant postprocessing problem. As Griffith explained, one important finding from their initial research using machine learning algorithms was that the algorithms accounted for the reviewer's identity in their calculations of variant veracity. "That led us to do this inter-reviewer variability analysis where we [saw] that there is a big difference [from] one individual to another [in terms of] how they are classifying these variants and their accuracy," he said. In publishing the SOPs, "what we wanted to do was lay down what exactly our procedure is and what is good practice for doing this manual review."
This is important, in part, because though DeepSVR takes much less time to review variant lists than a human reviewer would, its developers are not claiming that it can or should replace manual review entirely. Furthermore, even if the same operating procedures are deployed by all labs in the same way, the process still takes too much time since variant callers can generate thousands of variants per cohort. "If we accept that some amount of manual review is still important in the field and a clinical standard, then its valuable to have this SOP," Griffith said. "What the machine learning method allows us to do is massively reduce the size of that problem."
And the time savings are substantial according to numbers reported in Nature Genetics. Using data from a breast cancer study with over 10,000 variants identified, the researchers extracted a list of around 9,000 variants for manual review. Manually reviewing these variants at a rate of 70 to 100 variants per hour, an expert reviewer would need 90 to 130 hours to go through the list. Applying DeepSVR to the dataset would eliminate around 90 percent of the variants such that only about 500 variants, or 5 percent of the list, need to be reviewed by human eyes. In this scenario, an expert reviewer could complete the task in about five hours.
To develop DeepSVR, the researchers relied on a training dataset culled from a list of 41,000 variants gleaned from 440 sequencing cases spanning nine different cancer types and 21 studies. The cancer types included in the dataset came from both solid and hematological cancer cases. All the cases included paired tumor-normal sequence data. The variants had already been called and reviewed by experts and each variant had been grouped into one of four categories: a confirmed somatic variant, a failed variant, an ambiguous variant, or a germline variant.
For each variant in the training set, the researchers derived several features intended to capture most of the information that manual reviewers would be looking for when reviewing a list of variants, said Joshua Swamidass, an associate professor in the laboratory and genomic medicine division at WUSTL and one of the authors on the paper. This included features such as cancer type, sample type, mapping quality, depth of coverage and so on. The researchers then built three models to fit those descriptors and used them try to classify the variants in the training dataset as the expert reviewers had done. They then validated the algorithms on several independent datasets.
"I think it hits a real pain point for cancer genomics both on the research and clinical side," Swamidass said. "[Manual review] is a major source of variability between labs so I think that there is going to be either adoption of this tool or adoption of other tools that try and tackle this problem."
The slow pace of manual review is certainly something that many in the community are concerned about, according to Shashikant Kulkarni, professor and vice chair of molecular and human genetics at Baylor College of Medicine and chief scientific officer and senior vice president at Baylor Genetics. While the Nature Genetics paper highlights reviewer subjectivity and bias as an important reason to automate, Kulkarni isn’t convinced that this is strongest argument. "We know that when human beings are involved there would be an inherent bias," he said in an interview. But in terms of those biases affecting patient care in a real way, "I have to see additional data to believe that the subjectivity would lead to change in patient management."
In his view, one of the main problems is a shortage of trained and board-certified medical geneticists and genomic scientists to go through the labor-intensive review process. The situation becomes even more intractable when considering efforts to scale genomics-based biomedicine to the population level and to expand these techniques beyond oncology to things like germline disorders. And this is the sweet spot for machine learning and artificial intelligence-based methodologies like DeepSVR. "There is no way to scale with the workforce we have," he said. "In order to scale, these tools are absolutely important, and I think this is game changing."
Kulkarni's group at Baylor is also working on applying machine learning to the manual review problem. His team presented some of this work at the American Society of Human Genetics meeting last month in San Diego. They are using some of the same methodologies employed in DeepSVR but their efforts focus not just on cancer but also whole-genome sequencing and germline disorders. "It's a fantastic way to automate variant interpretation and it can [lower] the costs of the dry portion of the lab," he said.
Kulkarni believes that methods like DeepSVR are the future of manual review however, there is still a lot of work to do. Like DeepSVR's developers, he highlighted messy data as one of the impediments to applying machine learning techniques in this context. "Training datasets today are not there yet in terms of quality to use machine learning approaches. There's lots of noise and caveats that we need to figure out," he said. For example, there are situations where a single variant might be in six different clinical reports; the machine learning algorithm might read that as six different patients. Furthermore, there are potential legal issues to consider. "if we make this completely automated, who is responsible for it?" he said. "If there is an error, is it machine learning to be blamed or the professional who puts their name on the report?"
Besides incorporating DeepSVR into their internal variant review process, the WUSTL researchers are seeking to apply machine learning to another important use case. "Here were dealing with this ideal situation of having tumor-normal matched pairs which is what you want when you are doing cancer studies. But a lot of time in the field, for technical reasons, you don’t have normal tissue available," Griffith said. "A really interesting next application would be to see whether we can take this approach and develop a useful model for the tumor only situation. That would really capture the other half of the field that's dealing with this problem where they unfortunately don't have a matched normal."