This article, originally published Sept. 23, has been updated with comments from researchers.
Helicos BioSciences has demonstrated that its platform can be used to directly sequence single molecules of RNA, without prior conversion to cDNA, a capability the company said could have advantages for analyzing small amounts of RNA or short RNAs and for providing unbiased sampling of transcriptomes.
In a paper published online in Nature last week, Helicos scientists reported that they have sequenced nanogram amounts of polyadenylated RNA from yeast using a prototype of the company's HeliScope sequencer. The study, they write, "provides a path to high-throughput and low-cost direct RNA sequencing and achieving the ultimate goal of a comprehensive and bias-free understanding of transcriptomes."
Depending on the outcome of additional experiments, the company hopes direct RNA sequencing could become available to early-access customers sometime next year, according to Patrice Milos, Helicos' vice president and CSO, who added that there seems to be demand for this application. "The amount of interest, we are learning, is actually quite large," she told In Sequence last week.
"Direct RNA sequencing is a long-awaited advance for transcriptome analysis," said Rick Jensen, a professor of biological sciences at Virginia Tech who has assessed 454's platform for gene-expression studies. RNA sequencing with a read length of hundreds or thousands of base pairs could "truly revolutionize our ability to characterize and quantify the transcriptome," he said.
Others agree. The approach "has tremendous promise," said Brenton Graveley, a professor in the Department of Genetics and Developmental Biology at the University of Connecticut Stem Cell Institute, and "this study is a step in the right direction and certainly a proof of principle." Graveley's group has used Illumina's Genome Analyzer in the past to study gene expression.
According to Edwin Cuppen, a professor of genome biology at the Hubrecht Institute in Utrecht, the Netherlands, direct RNA sequencing "will for sure be useful" but "it is still early days for this technology." His group has studied biases of current sequencing methods for digital gene expression analysis of small RNA (see In Sequence 7/14/2009).
At least one competitor in the area of single-molecule sequencing is also developing direct RNA sequencing — Pacific Biosciences revealed at a conference earlier this month that it has shown proof-of-principle for sequencing synthetic RNA oligonucleotides directly on its single-molecule real-time analysis platform, a capability it plans to add within a year after the commercial launch of its instrument in the second half of 2010 (see In Sequence 9/22/2009).
Helicos embarked on its proof-of-principle experiments for direct RNA sequencing about nine months ago, according to Milos. Its platform seemed very amenable to capturing polyadenylated RNA, she said, given that the company already uses an oligo-dT surface to capture DNA for sequencing.
The initial work required finding a suitable polymerase and virtual terminator nucleotides, which are proprietary to the company, as well as optimizing conditions for capturing RNA on the surface and for the sequencing reaction.
According to the paper, the scientists first add a poly-A-tail to the 3' ends of RNA molecules and then block the 3' end, hybridize the RNAs to the poly-dT surface, and fill in the poly-A-tail with dTTP. They next sequence the RNA in a stepwise manner using labeled virtual terminator nucleotides, similar to the way the HeliScope sequences DNA.
Initially, they used a synthetic 40-mer RNA oligo as a model system to develop and optimize the chemistry and found that about half the reads were at least 20 bases long, the longest being 38 bases. The sequencing provided 972 reads per 1,000 μm2 flow cell surface area, compared to 1,100 reads for DNA sequencing "in similar conditions," the scientists wrote.
[ pagebreak ]
The total raw base error rate of the process was approximately 4 percent, of which 0.1 to 0.3 percent were substitution errors. "Although further improvements in error rates are in progress, the read lengths and error rates achieved here are sufficient to allow the use of standard computational methods to align sequences to reference transcriptomes and genomes," according to the authors.
Next, the Helicos researchers went on to sequence naturally polyadenylated RNA from Saccharomyces cerevisiae, starting with two femtomoles, or about two nanograms of isolated RNA. They noted that RNA stability "remained at high levels during the run," since about the same number of nucleotides were added in each cycle. The experiment generated on the order of 40,000 reads that were least 20 bases long, of which about half aligned to the yeast genome. The longest perfectly matched read had a length of 50 bases.
Like previous studies that used yeast cDNA, the scientists found the 3' ends of the transcripts to be heterogeneous. They also found polyadenylated small nucleolar RNAs — a new finding.
All experiments were conducted on a prototype instrument, "essentially a single-flow-cell mini-HeliScope," according to Milos. The company, which has submitted an NIH grant to do further work on direct RNA sequencing, is currently continuing to optimize the chemistry and beginning validation studies on the commercial HeliScope. The goal of these studies is to obtain "the millions of reads that we would like to get a deep view of the transcriptome as a direct measurement," she said.
So far, the company has applied the method to "some other pools of small RNAs" as well as "an initial view of some eukaryotic RNA," Milos said, adding that it is too early to discuss the results.
Shortcut to RNA
In principle, direct RNA sequencing would have several advantages over cDNA-based transcriptome analysis — which includes all next-generation sequencing-based approaches to date, such as RNA-seq. For example, the authors note, some reverse transcriptases generate spurious second-strand cDNA from DNA; template switching, contaminating DNA, or primer-independent cDNA synthesis lead to artefactual cDNAs; and cDNAs tend to be error-prone due to the nature of reverse transcriptases. Also, many protocols make it impossible to determine RNA strandedness.
Graveley said that strand-switching, for example, occurs frequently enough during existing library preparation protocols that control experiments are necessary. "This can be circumvented to some extent today using either emulsion PCR or amplification-free library prep protocols," he said. "However, direct RNA sequencing has the potential to completely eliminate this."
According to Milos, experiments where Helicos scientists compared an RNA oligo sequenced directly and the same oligo sequenced via a cDNA intermediate showed that a small fraction of cDNAs resulted from template switching and that there was cDNA hairpin formation. "While it's a low fraction, a few percent, it still represents reads that would be present in any cDNA-based measurement," she said.
Jensen said that in order to evaluate the significance of biases introduced by the cDNA synthesis, it would be interesting to include direct RNA sequencing in an ongoing study by the Sequencing Quality Control Consortium that compares results from "all of the latest next-generation sequencing technologies" for sequencing the Microarray Quality Control RNA reference samples (see In Sequence 1/6/2009).
Helicos also still has a lot of work to do to make direct RNA sequencing competitive with cDNA-based approaches, according to several researchers.
For example, the company has yet to show that the approach is quantitative and allows researchers to do differential gene expression analyses, according to Cuppen. "I think proof for both will be required to compete with established RNA-seq techniques in the future," he said.
[ pagebreak ]
In addition, he said, Helicos' approach may not be bias-free when it comes to RNAs that are not naturally poly-adenylated, such as several types of small RNAs. "We have seen in our experiments that poly-adenylation enzymes have a severe bias" that differs between specific enzymes, Cuppen said.
He and others also pointed out that both read length and throughput need to increase in order to cover the entire length of transcripts, and to capture the transcriptome comprehensively. "To get complete transcriptome coverage, this will need to improve so that millions and millions of longer reads can be easily generated," Graveley said.
Potential challenges in sequencing certain types of RNA molecules could arise from RNA with nucleotide modifications, such as methylation, and from secondary structures, "which are much stronger in RNA than in DNA," according to Cuppen.
"We are anxious to see what happens when we scale it to 50 million or 100 million reads," Milos said, and whether a new view of the transcriptome will emerge. "And we won't know until we really do those next-stage proof-of-principle experiments."
Another advantage of direct RNA sequencing is that it could work with small amounts of RNA, as well as short RNA molecules, for example from formalin-fixed, paraffin-embedded tissues. "We envision a future where you can take a cellular lysate, directly hybridize the RNA from that lysate, and directly sequence the RNA that is captured," Milos said. Eventually, it might be possible to analyze RNA from single cells, she added.
The method is likely to be less expensive than RNA-seq, as well. "If you require little, if any, sample manipulation … it should be pretty cost-effective, particularly as the HeliScope continues to drive down cost," according to Milos.