Skip to main content
Premium Trial:

Request an Annual Quote

Wash U Team Shares Method for Mining Virus Signatures from Unmapped Clinical Cancer NGS Data


Researchers from Washington University Medical School in St. Louis have shared their method for making use of discarded unmapped next-gen sequencing reads from the growing cache of clinical cancer sequencing data, as a hunting ground for viral pathogen sequences that might play a role in various cancers.

The group described the approach in a study in Experimental and Molecular Pathology this month, publishing data from a proof-of-concept experiment that used cancers with known viral associations to demonstrate the sensitivity of the method, as well as follow-up research looking for novel virus signatures in a group of 21 high-grade gliomas.

Eric Duncavage, the study's corresponding author, told In Sequence that the team became interested in looking for viruses in unmapped clinical sequencing reads as data from Wash U's targeted cancer sequencing program grew.

"We've sequenced close to about 2,000 clinical cancer specimens over the last two years," Duncavage said. "Typically about 98 and a half percent of the reads will map to the human genome, and so for the other 1.5 percent it's unclear what those reads represent."

"One of the things we were interested in was to see if we could use those off-target reads to find something, and we noticed several years ago that you could find viral sequences. … The idea of viral pathogen association in cancer has been around for at least 100 years, but there is only clear evidence in a very small subset of cancers that they are driven by viruses," said Duncavage. "It's possible there are others, but it's been difficult to prove convincingly."

"Since we [and others] have a whole bunch of clinical sequencing cases, this method is attractive [to investigate possible viral cancer associations] because it doesn’t really cost anything beyond CPU time and time to look at the data," he said.

To look for evidence of viruses in clinical cancer sequencing cases, the group developed an analysis method adapted from bioinformatics strategies used by Wash U's viral discovery group to progressively filter unmapped clinical sequencing data to pick out viral DNA sequences.

According to the team's report, the method involves repeatedly comparing sequences to the human genome — discarding those that align and keeping those which don't — to drill down to any patterns specific to known viruses.

In the study, Duncavage and his co-authors reported on two preliminary applications of their approach: the first, a proof-of-principle which showed that they could actually glean evidence of viruses in the discarded unmapped reads of clinical cancer sequencing data, and the second, an attempt to identify viral pathogens in unmapped data from a group of 21 glioma cases sequenced as part of Wash U's clinical cancer sequencing program.

To prove that the unbiased discovery approach actually works, the group collected whole-exome sequencing data from eight patients with Merkel cell carcinoma and nine with oropharyngeal non-keratinizing squamous cell carcinoma, two cancers with well-established viral associations. For each of the samples, the presence or absence of viral DNA was already known, so the group was able to compare the results of its unmapped-reads approach to the established status for each sample.

"We knew there was virus in there and we detected it by other methods, so if we could show we could use our method on these cases, we were comfortable then applying it to others," Patrick Cimino, the study's first author and a Washington University pathologist, told In Sequence.

In this first test, the team's strategy successfully identified Merkel cell polyomavirus DNA in the unmapped reads of two out of five total MCPyV-positive MCC cases, demonstrating an overall 40 percent sensitivity in this subset. In the nine cases of NKSCC, the approach picked out human papilloma virus DNA in the unmapped reads of seven of the nine total HPV-positive cases, for a sensitivity of 78 percent.

Based on these results, the researchers then went on to look at a set of gliomas with unknown viral-associations. According to Duncavage, there has been some controversy over whether or not viruses play a role in high-grade gliomas, with some studies showing widespread presence of human cytomegalovirus (CMV), and others finding the opposite.

"In the literature on high-grade gliomas and glioblastoma, people have used a more targeted approach to look for specific viruses they thought might be relevant, mainly CMV, which has been controversial. We didn't think anyone had really taken an unbiased approach to these before," Duncavage said.

The Wash U team also used their approach to search for viral signatures in the unmapped reads of a set of 21 high grade gliomas that were sequenced as part of the university's clinical cancer sequencing program using a targeted panel covering 151 cancer-associated genes.

On average, among the 21 cases, about 38,000 sequencing reads, or 1.9 percent of the total reads, did not map to the human genome. The group analyzed these reads using their bioinformatics pipeline, finding that of the 21 cases, five harbored DNA from the Epstein-Barr virus while one showed evidence of roseolovirus.

Interestingly, according to Duncavage, none of the samples showed detectable human CMV — the virus up for debate as a pathological player in glioma — in their unmapped reads. This could either be because there really was no CMV present in the cancers, the authors wrote, or because the targeted cancer panel used in the sequencing of the gliomas did not produce sufficient viral genomic coverage for present CMV to be detected.

Meanwhile, what the presence of EBV or roseolovirus in the study's glioma cases means in terms of the role these viruses might play in the development of the cancer is unclear, according to Duncavage. The group looked for evidence of EBV transcription in the samples, but didn't find anything.

"We didn’t see transcription of the [EBV] DNA, but that doesn’t mean it couldn’t be an early event; that somehow [the virus] gets integrated into the genome and that is an initiating event [for the cancer]," Cimino added.

"We also did see EBV in the Merkel cell and squamous cell cases as well, but we had higher reads in the gliomas relative to those. So the possibility exists that it might be relevant," he said.

According to Duncavage, the team is interested now in looking for viral signatures in other tumor types in addition to glioma, most likely lung and colon cancer next.

As clinical cancer sequencing accelerates, the available targets for such analyses are ever multiplying, he said.

Though large clinical sequencing databases for other diseases don’t come close to what has been amassed in cancer, researchers are interested in looking for signals of viral involvement in other diseases where the method might also be applied, such as multiple sclerosis and other inflammatory or nervous system disorders, Cimino said.