By Julia Karow
This article, originally published Dec. 9, has been updated with comments from Rita Colwell, a cholera expert.
In a demonstration that its real-time single-molecule sequencing technology can quickly generate genomic information on a pathogen from a disease outbreak, Pacific Biosciences, in collaboration with Harvard Medical School, has characterized the bacterial strain from the recent Haitian cholera epidemic.
The study's main conclusion — that the Haitian cholera strain probably originated from South Asia, rather than Latin America, and was introduced by human activity — had already been demonstrated earlier by a team from the US Centers for Disease Control and Prevention, which used DNA fingerprinting technology. But the PacBio researchers provided a more complete picture of the Haitian cholera strain and its virulence, at relatively modest cost, indicating that the time might be ripe for sequencing-based rapid pathogen analysis.
"We are there," said Matthew Waldor, a professor of medicine at Harvard Medical School and the senior author of the study, published last week in the New England Journal of Medicine. "That's maybe a major implication for this study: Molecular epidemiology can be done with full genome sequencing with extreme rapidity and analytic power."
But some experts disagree with the conclusion of the study that the Haitian strain likely came from South Asia. "The paper was highly premature, based on very little evidence," said Rita Colwell, a professor at the University of Maryland College Park and at the Johns Hopkins School of Public Health, and a cholera expert, adding that she was "surprised that it was accepted for publication."
The entire project — from receiving the samples to submitting the paper for publication — took two weeks. According to Waldor, he and his colleagues obtained samples from Haiti on Nov. 5, cultured Vibrio cholerae and isolated its DNA, and shipped it to PacBio, along with DNA from four other V. cholerae strains. PacBio received the samples on Nov. 10, completed the first draft genomes on Nov. 12, and followed that with data analysis over the next several days. The researchers submitted their paper for publication on Nov. 19.
In total, the PacBio team sequenced five cholera samples: two isolates from Haiti, one Peruvian isolate from 1991, and two Bangladeshi isolates from 1971 and 2008.
For each strain, they obtained between 28-fold and 60-fold coverage within 48 hours of sequencing and mapped the reads to a reference sequence to call structural variations and single nucleotide variants. The average read length of unfiltered sequence for the two Haitian strains was about 950 base pairs, with 5 percent of the reads exceeding 2,800 bases.
For the Haitian strain, they generated 12- to 15-fold genome coverage from six chips in a 90-minute sequencing run, according to Eric Schadt, PacBio's chief scientific officer, "which was enough coverage to identify all the larger structural variations that unambiguously indicated from which strain the Haiti strain had derived."
Besides comparing the genomes of the five strains they sequenced, they placed them in the context of another 23 cholera strains previously sequenced by Colwell's group, which used a combination of Sanger, 454, and Illumina data, and compared them to more than 70 strains that had been genotyped by the CDC for several dozen markers. "We used all this information to position the five strains we sequenced within phylogenetic trees to identify the lineage of the Haitian isolates," Schadt said.
The consumables cost for sequencing the five samples would currently be "on the order of a couple of thousand dollars" for a customer of PacBio's commercial platform, he said.
The results showed that the Haitian strain is most closely related to the South Asian strains, not the Latin American one. "That implies that human activities were very likely the vehicle for transport of cholera to Haiti," Waldor said. "Although in some ways that is disturbing, also, we can do something about it in the future."
And because the strain appears to be more pathogenic than strains occurring in Latin America, it becomes more urgent now to stop its spread throughout the region, for example by vaccination, Schadt said.
[ pagebreak ]
Researchers from the CDC had already come to the same conclusion regarding the likely origin of the strain using a DNA fingerprinting method, pulsed field gel electrophoresis. They reported their findings on the CDC website on Nov. 1, noting that they were also planning to sequence the genome of the Haitian strain. They have already deposited the unassembled genome sequence data from three Haitian strains in GenBank, and the PacBio team found that they are very similar to their own isolates, according to the paper.
Colwell said that her team has compared the CDC's sequence data in GenBank to her previously sequenced strains from around the world, and found that there is a "significant difference" between the Haitian strains and the Asian strains. Although her analysis, which they plan to publish, shows that they are similar, there is not enough evidence to say that the Haitian cholera actually came from Asia, according to Colwell.
She also disagrees with the paper's conclusion that, based on the fact that it differs from Latin American and US Gulf Coast strains, the Haitian strain did not arise from the local aquatic environment. "How do they know? They have not sequenced an environmental strain," she said, adding that her team is in the process of sequencing DNA extracted from water samples as well as additional isolates from Haitian patients, in collaboration with scientists from the Institute for Genome Sciences at the University of Maryland.
In an e-mail response, Schadt said he looks forward to seeing the results and methods from Colwell's group and the CDC in a peer-reviewed journal.
In light of the early DNA fingerprinting results from the CDC, the need for sequencing the cholera strain so quickly was maybe not that urgent. However, the study provided proof of principle for practical use of the PacBio platform. "By showing that we could do this, basically, in 48 hours, it shows the tremendous capabilities of this type of technology for the future of outbreak analysis and prediction," Waldor said, adding that "scientific competition" was another incentive for getting the genome sequence fast.
Colwell noted that the quality of the sequence data is not the reason she disagrees with the study's conclusions. "The sequencing is excellent; I have no quibble whatsoever," she said. "I just don't think that the analysis of the sequence was effective."
In fact, Colwell said her own team is also working with Pacific Biosciences on a microorganism identification project, but she declined to provide further details at this time. The project uses PacBio's sequencing platform and her team's bioinformatics approach. Colwell is the president and chairman of CosmosID, a bioinformatics company focusing on the interpretation of genomic data.
While other next-gen sequencing platforms would have likely identified the same single nucleotide variants, Schadt said the turnaround time would have been longer, and the read length would probably have been too small to detect large structural variations, especially in regions with repetitive elements.
"The problem even with 454 is that the 400- to 500-base pair reads are still not long enough to span many of the repetitive elements that occur in these structurally variant regions, which can be on the order of 1 to 10 kilobases," he said. "You need read lengths that can completely span these repetitive elements to map them unambiguously."
Obtaining a more complete view of the Haitian cholera than by DNA fingerprinting allowed the researchers to say with more confidence that the strain originated from South Asia, Schadt said. And it allowed them to gain additional insights into its makeup. "For example, we found that the sequence of the cholera toxin is different in this strain than it is in strains in Latin America," said Waldor, which has implications for how quickly it might spread.
Schadt said that he and his colleagues will continue to work with the Harvard researchers on sequencing more V. cholerae strains from recent outbreaks, for example in Western Africa, in order to determine their lineage.
In addition, they are working on obtaining complete genome sequences for the five strains they sequenced for their recent study, using a de novo assembly approach that combines reads from PacBio and Illumina. "The combination of longer reads [from PacBio] that have somewhat lower accuracy with the short reads of Illumina that have higher accuracy provides for a very powerful de novo assembly approach that can't be achieved with the second-gen technologies alone," he said.
Have topics you'd like to see covered by In Sequence? Email the editor at jkarow [at] genomeweb [.] com.