Researchers at McGill University have developed two different protocols for sequencing bacterial genomes in under 24 hours to monitor outbreaks. Reporting at the BGI-sponsored International Conference on Genomics in Americas conference in Sacramento, Calif., last week, Ken Dewar a professor at McGill University in Montreal, presented methods his laboratory has developed on both the Illumina MiSeq and Pacific Biosciences RS machines.
Dewar said that around two years ago the lab began investigating the possibility of sequencing and assembling a bacterial genome in less than a day for under $1,000.
"A lot of systems could offer genomes, theoretically, at a very low cost," said Dewar. "But the low cost is derived from pooling many samples together. And in the context of an urgent response to a single bacteria, this is very difficult."
As such, the team zeroed in on two sequencing technologies — Ilumina's MiSeq and PacBio's RS. "We wanted the long reads of PacBio to put together long contiguous stretches and the accuracy of the MiSeq," Dewar said.
After evaluating each of the company's protocols — the MiSeq run with the TruSeq chemistry and paired 250 base reads, and the RS run with its standard SMRTbell library preparation — Dewar said that the turnaround time for each was still three to four days, rather than the desired 24 hours. So, his group began working on tweaking those protocols to reduce the turnaround time.
The result are "rapid response" protocols that each operate in roughly one day. Dewar's team replaced the TruSeq chemistry with Nextera for library prep on the MiSeq, and instead of running 250 by 250 base reads, he reduced reads to 150 bases, which reduced sequencing time to 12 to 14 hours. "We just pull off the first 150 bases during the run," Dewar said.
To reduce turnaround time on the PacBio instrument, Dewar's group used a library prep method originally developed by researchers at the Wellcome Trust Sanger Institute (IS 12/18/2012).
Their technique, published in BioTechniques last year, requires one nanogram of starting DNA and can generate sequence data within eight hours of receiving the sample.
Dewar said that his group modified the method slightly from the Sanger team's original method. For instance, he said, "We queue up more cells than we believe we need." The team will pull off the sequence data after the first or second SMRT cell and begin the bioinformatics analysis portion, while the sequencer continues to run. "If we have enough data, great. If not, we have more coming," Dewar said.
In fact, a key aspect of both the MiSeq and PacBio rapid response protocols is that they generate "contingency" sequence data, Dewar said. With the PacBio machine, the team runs extra SMRT cells, and with the MiSeq machine, they begin to analyze the data before the run is complete.
Dewar's team recently tested the protocols in collaboration with the Canadian Food Inspection Agency, the Public Health Agency of Canada, and McGill University's International Tuberculosis Center.
He said the tests were equivalent to "fire drills." The agency would send in a blinded sample, and the team would get to work running the samples in order to identify the strain. Those strains would receive priority on the sequencers, he said.
In all three cases, the data was produced in less than 20 hours and was sufficient enough to close the genomes and respective plasmids. The three samples included Salmonella enterica, Listeria monocytogenes, and an unknown species. The fire drill demonstrated that "the technology and techniques are robust and don't require reference genomes," Dewar said.
Now, Dewar said the team is primarily using the RS for the initial protocol and the MiSeq to validate, but that could change. Each platform has advantages and disadvantages.
For instance, Dewar said the RS is able to generate high-quality complete bacterial genomes. "Feedback on the quality of the assemblies has been universally positive," he said.
"What's not clear is that, as PacBio shifts to longer and longer molecules as a starting point, how we would catch the small plasmids?" For instance, he said, if PacBio moves to 10-kilobase reads, it will make it difficult to detect virulence genes in a 5-kilobase plasmid, he said. Additionally, the RS has "stringent DNA quality requirements," he said. The DNA has to be a high molecular weight and pretty pure, which may not always be possible in an outbreak situation when DNA is extracted from clinical or environmental samples.
On the other hand, MiSeq data alone cannot always produce completely finished genomes due to repetitive elements in bacterial genomes.
Implementing one or both of the protocols in real-time outbreaks would be up to the health organizations, Dewar said. "The goal is to make sure they know that the infrastructure and capacity are here," he said. He envisions either his lab running the outbreak samples, or helping a hospital or public health agency set up the protocol in their own labs.
Costs are also coming down, making sequencing an attractive option for outbreak monitoring. He estimates that today using the rapid response protocol, a Clostridium difficile strain could be sequenced in under 24 hours for around $4,000, and, "I think we're approaching the $500 genome," he said.
Being able to identify co-infections will also be important for implementing the protocol in real-world settings, Dewar said.
"It just seems inherently risky to isolate a single colony and make a sequence from it and say that's the only thing that was there," he said.
Additionally, he is now investigating how to identify an outbreak strain from a mixed or complex sample.
"I would like to go into a microbiome and pull out the 50 most common genomes as [finished] genomes," he said.
To do this, Dewar explained that he is testing the ability of Illumina's HiSeq order to generate lots of sequence information at high coverage, which will be important for identifying rarer strains, and then using the PacBio RS as a scaffolding tool.