Scientists at Helicos BioSciences and their academic collaborators have shown for the first time that an entire genome can be analyzed by sequencing single molecules of DNA.
To showcase its “true single molecule sequencing” technology, company researchers re-sequenced the 6.4-kilobase M13 virus and published their results, along with a description of the technology, in Science last week.
Although the data, which was generated using an earlier version of Helicos’ technology, revealed problems with long homopolymers and deletion errors, it demonstrates that single-molecule sequencing is possible, and that its quality is sufficient for several research applications, according to scientists.
Helicos’ sequencing method “is clearly in the same league as the other second-generation methods, which require some form of polony amplification,” such as those sold by 454, Illumina, ABI, and Danaher, George Church, a professor at Harvard Medical School, told In Sequence last week. “This technology would certainly be interesting to test in pilot sequencing projects like the 1,000 Genomes Project or the [Personal Genome Project],” said Church, who is a member of Helicos’ scientific advisory board.
However, what is still missing from this and other published sequencing projects that have used next-generation sequencing technologies is a “rigorous cost analysis,” Church said.
The company said in a press release that it plans to publish “many other” scientific reports in the coming months that will focus on BAC sequencing and microRNA sequence analysis.
Helicos’ technology differs from other next-generation platforms in that it sequences single DNA molecules, which avoids bias introduced by amplifying DNA prior to sequencing.
“Even though [amplification] is not a big worry for many, it is a concern in the back of people’s minds,” Yuan Gao, an assistant professor at Virginia Commonwealth University, told In Sequence in an e-mail message. The researchers show no data, he noted, that backs their claim that amplification-based sequencing technologies are biased, but “the ability to go single molecule is a true plus for many applications.”
According to Gao, who is a former postdoc in Church’s lab, both the read length — about 25 bases on average — and error rate presented in the study are already “acceptable for many applications,” such as digital gene expression, microRNA studies, ChIP sequencing, and comparative genome studies of microbes. However, Gao stressed that several aspects of the technology could still be improved, including throughput, error rate, and the ability to sequence mate pairs.
A Milestone
In their study, Helicos scientists sequenced the M13 virus to more than 150-fold average depth and with 100-percent coverage. Comparing their data to M13 genomes with simulated mutations, they were able to discover more than 98 percent of these mutations and did not call any false positives.
Helicos’ M13 project, although modest in size and complexity compared to larger genomes, is significant because “it is a milestone in developing single-molecule sequencing that was sought for many years in order to be able to reduce cost and increase throughput,” said Ido Braslavsky, an assistant professor in physics and astronomy at Ohio University and one of the article’s authors.
As a postdoc in Steve Quake’s lab at Caltech, in 2003, he published a proof-of-concept study on single-molecule sequencing in PNAS that became the basis for Helicos’ technology. In that study, he did not generate consecutive sequence data but had to introduce spacers. The current study is “a demonstration of the system and how it works, but this is not the limit of the system,” he said.
Helicos “is clearly in the same league as the other second-generation methods, which require some form of polony amplification.” |
The data for the project was generated at least a year ago, according to Braslavsky, who helped Helicos transfer the technology from Caltech and to start scaling it up. In a press release from December 2005, Helicos said it had sequenced M13 but did not publish any data at the time.
Helicos scientists had to develop Bradlavsky’s method further in order to be able to “routinely and reliably detect single fluors with sufficient signal-to-noise ratios,” said Kevin Ulmer, a former full-time consulting scientist for Helicos who was involved in developing the platform.
For example, they had to develop “highly rinsable” surfaces to enable them to wash off fluorescent nucleotides after each incorporation cycle that would otherwise create background noise. Another method, called total internal reflection fluorescence, or TIRF, also helps with reducing background, he said.
In order to protect the labels, they also had to come up with a protective solution to prevent photobleaching and other photochemistry events, he said.
Further, Helicos developed nucleotides with cleavable fluors so the labels can be removed after each cycle. Unlike sequencing platforms that image amplified DNA, Helicos does not need to incorporate labeled nucleotides with 100 percent efficiency, Ulmer said, because each strand is imaged separately and strands do not need to grow in sync.
“If a given molecule on your surface doesn’t incorporate a base on one chemistry cycle, it doesn’t matter, because the next time you come around with the same base, … you can sort of pick up the trail,” he explained.
But the published technology can still be improved in several areas, according to scientists, and Helicos has already talked about some of its plans.
For example, in the M13 project, the company sequenced 280,000 DNA molecules in a run, “only a fraction of what Illumina or ABI can do in a single lane,” Gao said. However, Helicos has said that it flow cells allow densities of at least 100 million DNA strands per square centimeter in principle.
Also, the scientists reported “significant” sequence errors in longer homopolymers, especially in cytosine runs. However, they said they could compensate for errors in C homopolymers by sequencing the complementary strand.
For more than a year, the company has talked about “virtual terminator” nucleotides it has developed, which prevent the polymerase from incorporating more than one nucleotide at a time.
The dominant error type the scientists reported in their paper was deletions, which ranged between 2 and 7 percent in single sequence reads. However, by sequencing the same strand twice — a feature only possible in single-molecule sequencing — they were able to reduce the combined deletion error to between 0.2 and 1 percent.
Two-pass sequencing, though, comes at a cost of doubling the time required for data acquisition. “You have a trade-off between the level of accuracy that you want and the rate of [sequencing],” according to Braslavsky.
According to Gao, the deletion error rate “is still too high” and “it seems no easy solution [is] available.” However, Gao said, “one nice surprise is that the error rate stays flat as the read length increases,” whereas for other sequencing technologies, the error rate increases with read length as the signal gets weaker.
The study also does not show data for paired-end reads, although the company has said it is working on this capability. “This will be a disadvantage for Helicos,” Gao said, since other platforms already offer this feature.
Gao also pointed out that in their study the scientists used between 100 nanograms and 2 micrograms of genomic DNA, presumably because they had to ligate on an adaptor for two-pass sequencing. “For applications like ChIP-Seq, it is not easy to get that amount,” he said.
The system’s average read length in the study was 23, but Helicos said it the paper that it has already performed runs “with average lengths as high as 30.”
Helicos declined to provide officials to be interviewed for this article, but said in a press release that it plans to publish “many other” scientific reports in the coming months that will focus on BAC sequencing and microRNA sequence analysis, projects the company presented at the Advances in Genome Biology and Technology meeting earlier this year.
At the meeting, Bill Efcavitch, Helicos’ senior vice president for product R&D, said that the company has sequenced a canine BAC with 15-fold coverage and an error rate “close to” 0.5 percent, covering 99.8 percent of the bases.
At the same meeting, Tim Harris, Helicos’ senior director of research and the lead author of the M13 study, showed results from a human brain miRNA-sequencing project that used Helicos’ technology “to obtain accurate enumeration of related [miRNA] families” and “to identify new miRNA members,” according to the conference abstract.
In the meantime, the current study proves that single-molecule sequencing is feasible. “It’s extremely gratifying to me, after having been in this space for 20-odd years, to finally see this level at last of validation of the concept,” Ulmer said.
In 1987, he founded a company, Seq, that attempted to develop an exonuclease-based single molecule-sequencing technology. Helicos’ Harris worked at Seq in the 1990s. “There were plenty of people who thought I was absolutely nuts back in 1987, proposing that you will never be able to do sequencing by some kind of single-molecule method,” according to Ulmer.
Other companies are working on single-molecule sequencing-by-synthesis sequencing as well, including VisiGen Biotechnologies and Pacific Biosciences. However, their methods record the sequence in real time and promise longer reads than Helicos’s system can provide.