NEWYORK (GenomeWeb) – A University of California, San Francisco-led team has completed a proof-of-principle study on a sequencing test and analysis pipeline for detecting viral pathogens, such as the Ebola virus, in clinical samples that they believe could help researchers in resource-poor settings diagnose infections quickly and relatively inexpensively.
The test uses Oxford Nanopore Technologies' MinIon sequencing instrument and a web-based computational pipeline dubbed MetaPore, which offers mapping and visualization tools. The researchers, led by Charles Chiu, an associate professor of laboratory medicine at UCSF, used the test to identify the Ebola virus in blood samples collected from two African patients who had been diagnosed with acute hemorrhagic fever. They also used the test to detect the presence of the chikungunya virus in a sample from an asymptomatic Puerto Rican patient who later developed symptoms,and to detect the hepatitis C virus in blood from an infected UCSF patient.
Chiu had presented the results at a conference this spring and published them as a preprint. This week, the study appeared in Genome Medicine. According to the paper, the total time from sample to answer for the assay was less than six hours, compared to a similar assay using Illumina's MiSeq that took over 20 hours. Furthermore, the researchers were able to identify viruses present in the sample within minutes of obtaining the data — within four to 10 minutes for the Ebola and chigunkunya viruses, according to the paper.
To demonstrate the efficacy of the assay in a real world setting, Chiu told GenomeWeb, he and his colleagues are now working on placing the MinIon-MetaPore system in regional labs in Central and West Africa and to provide training that will enable researchers there to run the assay themselves. Chiu's team has already begun collaborating with the National Institute of Biomedical Research in Kinshasa, Congo, which handles cases of hemorrhagic fevers in the country.
As part of those efforts, the researchers are working on making all associated informatics components locally available, so that users do not necessarily need internet access. Although MetaPore can run on a laptop or local compute server, currently the base calling step of the sequencing process runs on Metrichor, Oxford Nanopore's cloud-based infrastructure. As an alternative, the researchers plan to implement a local version of the nanopore base calling program at user sites.
This should not add to computation times, according to Chiu. In fact, it could possibly make things faster since time is not wasted uploading and downloading reads to and from the cloud. For instance, in one of the Ebola cases mentioned in the study, initial detection of the virus' presence took roughly three minutes. Running all the computation locally could possibly shorten that to 30 seconds, Chiu said.
MetaPore shares some similarities in terms of structure and format with another computational pipeline — for Illumina sequence data — developed by Chiu's lab, along with collaborators at a number of institutions in the US and abroad. The so-called sequence-based ultrarapid pathogen identification (SURPI) pipeline, described in a Genome Research paper last year, uses a computational subtraction approach to separate host sequences from alien ones and then uses two alignment algorithms to match the latter dataset to candidate pathogen data stored in NCBI's repositories. The first algorithm matches sequences to the NCBI's human and pathogenic databases, while the second algorithm compares sequences to the NCBI's protein databases.
At Cambridge Healthtech Institute's Molecular Medicine Tri-Conference in February Chiu described an assay on Illumina's MiSeq system that he said would use the SURPI pipeline for data analysis. At the time, Chiu said the planned to launch the assay initially as a laboratory-developed test within three to six months. This week, he told GenomeWeb that he and his collaborators plan to launch the clinical assay early next year. The project is one of two demonstration studies selected last month to share $2.4 million in funding from the California Initiative to Advance Precision Medicine.
There are some differences between the two informatics pipelines, according to Chiu. For example, because of the higher error rates of nanopore data, MetaPore uses a local aligner instead of a global aligner to map reads to references databases, but otherwise they work in essentially the same way. Like SURPI, MetaPore first cleans input sequences and then uses two algorithms — Blastn and MegaBlast— toremove host reads by mapping them to the human reference and to map the remaining reads to viral datasets or all non-human datasets in GenBank.
"All of this analysis is done in real time, meaning that as the sequences are being generated, we … update the results every minute," Chiu explained. Results are presented in pie charts that show, for example, what viruses are present and in what proportion. The system can also output the raw data or present the reads in tabular format for further analysis, he said.
Furthermore, the Genome Medicine study showed that few reads are actually needed to identify pathogens present in the sample. Part of the reason for this is that viral genomes have very distinct sequences, Chiu explained, meaning that in theory, a single read could be sufficient to identify a virus, which is what essentially happens with PCR technology. In practice, to avoid misinterpretations, MetaPore has a set threshold that requires that two separate viral gene regions have to be identified for a positive ID. An alternative metric could be a certain level of genome coverage that would need to be met, Chiu said.
A second reason for needing fewer reads is that nanopore reads are longer than those generated by short-read technologies and as such cover more genomic ground. "When we looked at the viruses, the read lengths averaged about 400 bases but we were [also] getting read lengths as long as 900 bases," Chiu said. "That's fairly significant, given that the genome is only [about] 19,000 bases ... we are getting almost one kilobase of sequence. That's a single sequence that because of its length tends to be very unique and identifiable."
This is also why the approach works in spite of high error rates of the nanopore technology, he added, a point borne out in comparison tests between viral sequences generated by nanopore and Illumina sequencing technologies that are reported in the paper. Although Illumina reads were more accurate, phylogenetic analyses of the datasets showed that the results from both systems were comparable.
Besides getting the system into local researchers' hands, for their next steps, Chiu's team is also exploring ways to improve the sensitivity of the technique. Although the paper reports that the researchers were able to detect the Ebola and chikungunya viruses in patients within four to 10 minutes of acquisition, those numbers are only possible with samples that have very high viral titers — 107 to 108 copies per milliliter. The researchers were able to detect the Hep C virus at a lower titer — 105 copies per milliliter — but it took nearly 40 minutes for the identification, according to the paper.
Part of the reason for this is the method's unbiased approach to sample testing, which results in the final output containing far more host than viral reads, Chiu explained. Ways to improve sensitivity that the team is currently exploring include adding a viral enrichment step, as well as developing new methods for removing host reads. Improved throughput would also be helpful, he said, adding that Oxford Nanopore is already making improvements to its technology that will address this. Ultimately, "I want to get down to a sensitivity of about 100 to 1,000 viral copies per mil[iliter] of blood," he said. "If we could do that, it would be comparable to gold standard PCR testing for these viruses." On the informatics front, the researchers have switched to a faster alignment tool — called the Spliced Transcripts Alignment to a Reference software — to help speed up the mapping process, he said.
They are also mulling the possibility of using the platform to diagnose bacterial infections, Chiu told GenomeWeb, and are exploring unbiased metagenomics, probe-based target enrichment, and amplicon sequencing of 16S RNA or other conserved sequences for that. Eventually, they would like to create a "pan-pathogen" platform that also includes fungi and parasites, similar to their current MiSeq assay.