NEW YORK (GenomeWeb) – Researchers at Columbia University have developed a capture-based targeted sequencing method that can identify viral genomes from clinical samples.
The team described the method this week in the journal mBio and plans to continue to refine it to increase sensitivity and make it suitable for diagnostics, co-lead authors Amit Kapoor and Thomas Briese, told GenomeWeb.
The group will eventually look to license the technology to a partner that would commercialize the test. In the study, the researchers estimated that if 20 samples are multiplexed, the test could cost around $40 per sample.
Kapoor said the group wanted a better way to identify unknown viruses present in clinical samples. Currently, viral diagnostics typically rely on PCR, which is sensitive, he said, but requires the clinician to know what to look for. Instead, he said, "we should consider all different kinds of pathogens that are out there, and not just focus on one or a group of pathogens."
Between 30 percent and 50 percent of cases of respiratory illness and gastroenteritis remain of unknown etiology, Kapoor said. "To fill that gap, we thought we needed a comprehensive platform that includes everything we know, so we're unbiased."
Briese added that currently most clinical tests are serology- or PCR-based. When running those tests, "clinicians always have to make a decision of what is the agent assumed to be there or what to target," he said. If the test is negative, then clinicians have to choose another test. After a while, testing becomes expensive and time consuming.
The Columbia team decided to try and design a test that would target all known vertebrate viruses, with a particular focus on those that infect humans. They selected oligonucleotides that represented all known viral taxa that contained at least one virus known to infect vertebrates. They excluded most plant and insect viruses, unless a virus was known to also infect humans.
A unique feature of viruses that makes the design of a universal assay tricky is that unlike bacteria, which contain a conserved 16S region in their genomes that can be targeted from a mixed sample in order to get a snapshot of the different classes of bacteria present, viral genomes contain no such region. In addition, the genomes are typically small and mutate frequently.
The researchers used the EMBL Coding Domain Sequence database to extract viral genome sequences and created 100-mer oligonucleotides. Because they wanted their assay to target all variations of specific viral taxa — for instance, all strains of HIV — they allowed for some divergence. The final assay, dubbed VirCapSeq-VERT, consisted of just under 2 million probes between 50 and 100 nucleotides in length that covered around 207 different taxa of viruses. In some cases, a single taxa could include hundreds to thousands of viral genomes, Kapoor said.
After designing the probe library, the team mapped it against a database of 100 reference viral genome sequences representing double- and single-stranded DNA and RNA, positive and negative RNA, and circular, linear, and segmented viruses. They set a threshold for 90 percent nucleotide identity, and found that the VirCapSeq-VERT assay covered all the coding regions, but not noncoding regions and only hybridized to vertebrate virus genomes, not genomes from bacteriophages or plant, insect, and fungal viruses.
Next the group wanted to test the assay's ability to extract viral sequences from a background of mostly human sequences. Starting with blood nucleic acids or lung tissue nucleic acids, they spiked in various amounts of viral nucleic acids and then ran the VirCapSeq-VERT assay. The assay resulted in a 100- to 1,000-fold increase in on-target viral reads compared to just shotgun sequencing. The assay also reduced the human host background reads from 99.7 percent to 68.2 percent in the lung mixture and from 99.4 percent to 38.5 percent in the blood mixture. In addition, they were able to achieve full-length sequences from more than 95 percent of all the viral sequences.
The researchers then wanted to figure out the assay's lower limit of detection. To do this, they tested various levels of West Nile virus and the human herpes simplex 1 virus in both lung tissue and blood. At input levels of 100 viral copies in 50 ng of blood or 1,000 viral copies in 100 ng of lung, the assay recovered more than 90 percent of each genome, corresponding to a blood clinical specimen containing approximately 1,200 copies per ml or a tissue clinical specimen containing approximately 16,000 copies per mg. At the lowest input level tested — 100 viral copies per ml of blood — the assay captured 29 percent of the herpes virus and 7 percent of the West Nile virus.
Finally, they tested 1 ml of human blood and serum samples spiked with live enterovirus. The assay could detect the virus in both sample types down to a concentration of 10 copies per ml, "comparable to the sensitivity of real-time PCR," the authors wrote.
When they compared their assay to other enrichment techniques, like DNase and rRNA depletion, the researchers found that the VirCapSeq-VERT assay had a 10,000-fold increase in mapped read counts and was able to recover the full genomes of most viruses, even where there was less than 1,000 copies of the target input.
Being able to characterize the full genome of the viruses is an advantage of the test, Briese said. "With PCR, you get a positive or negative result," he said. But with the VirCapSeq-VERT assay, "we can characterize the sequence precisely," enabling researchers to learn how similar or different that specific viral genome is from others.
In addition, unlike PCR, the sequencing-based assay can detect multiple viruses present in the same sample, added Kapoor. "More and more clinicians are realizing that coinfections are very important," he said. Viruses often interact with each other.
For instance, "there might be a pre-existing infection that dysregulates the immune system without causing disease but enables another agent to come in more easily and cause disease, which it might not do in people without that underlying infection," Kapoor said. "With this platform, we can detect those cases."
Other groups, like Charles Chiu's team at the University of California, San Francisco, have turned to metagenomic sequencing to assess unknown infections in patients. But Briese said that metagenomic sequencing to identify a virus is difficult because the viral sequences make up such a small portion of the total genomic information. With metagenomic sequencing, "you sequence all the human DNA, all the bacteria, the phages, everything, and then against that background you have to find a few molecules of virus."
Kapoor said that the group's next step is to continue to optimize the test to make it more sensitive. In addition, he said they are considering designing more syndrome-specific tests that would boost coverage of viruses related to respiratory conditions, for instance. Such an assay would still target all the other viruses, but would just devote more of the probes to viruses related to the specific syndrome.
"The plan is to make it syndrome specific in a way that we will not compromise on the extent of diversity that we cover," he said. So, for instance, in an assay focused on respiratory disease, about half of the 2 million probes would be devoted to respiratory-related viruses, while the other half would target the remaining viruses.
Kapoor envisions having multiple of these assays that focus on a specific group of syndromes, which would make it "more efficient and specific for that patient's syndromes." But, at the same time, "more and more we see unexpected viruses in places we don't expect them to be, so we don't want to lose the capacity of finding these."