By Monica Heger
Researchers at the Wellcome Trust Sanger Institute have developed a strand-specific method of transcriptome sequencing on Illumina's Genome Analyzer that they claim can produce more accurate sequence data than Illumina's standard RNA-seq method.
In addition, the Sanger team's method is the only RNA-seq approach to date — aside from Helicos Biosciences' direct RNA sequencing method — that does not require an amplification step.
"It's the only protocol that is strand-specific with no amplification or artifacts, is compatible with paired-end sequencing, and works well with Illumina," said Daniel Turner, Sanger's head of sequencing technology development and senior author of a paper describing the approach that was published online in Nature Methods this week.
"I think this really strengthens the case that non-amplification methods give you a true biological representation [of the transcriptome]," said Patrice Milos, vice president and chief scientific officer of Helicos, whose direct RNA-sequencing method should be available to customers this year (see In Sequence 9/29/2009).
The key to the Sanger approach is that the reverse-transcription step occurs directly on the flow cell. This ensures strand specificity and eliminates the need for a PCR step. Turner said the method, dubbed FRT-seq for flow cell reverse transcriptase sequencing, should be particularly useful for samples that have a high adenine and thymine content, because those tend to be underrepresented in protocols with a PCR step.
In a comparison of FRT-seq to Illumina's standard RNA sequencing protocol, the team found that it generated more sequence data, was more accurate, and had less bias and fewer duplicate reads.
The researchers tested the method on a human placental sample, preparing two libraries with their method as well as with the standard method. They found that the two FRT-seq libraries generated 3.3 gigabases and 3.5 gigabases of sequencing data, compared to 1.6 gigabases and 0.5 gigabases with the libraries for the standard protocol.
To test reproducibility, the authors mapped the reads to annotated genes, and compared the sequences. Using the FRT-seq library-preparation method, the two different libraries had a correlation of .993. Within the same library, the different lanes had correlations of between .998 and 1.000. The standard method had a correlation of .866 between the two different libraries and a correlation of 1.000 between lanes from the same library.
The authors reported duplicate read percentages of 6.1 percent and 7.2 percent for the two libraries with FRT-seq, compared to 94.1 percent and 39.7 percent for the two libraries used for the standard method. The difference in duplication between the two methods was due primarily to bias from the PCR step in the standard method, said Turner.
Milos said that while the Sanger team's approach is a step in the right direction, she questioned the library-preparation method, which involved a series of fragmentation, dephosphorylation, the addition of adaptors, and phosphorylation steps.
"The more you touch a biological sample, the further you get from accurate quantification," she said.
In addition, Milos noted that while the method clearly gave better, more accurate results than RNA-seq methods that require amplification, the 6 percent to 7 percent reported duplications could be cut down even further with improved library preparation.
She also thought that the 250 nanograms of polyadenylated-plus RNA that the method required might be too large of a sample size for some clinical samples such as formalin-fixed tissue and circulating tumors. For samples like that, you "want to be looking at picogram quantities," she added.
Brenton Graveley, associate professor of genetics and developmental biology at the University of Connecticut, said that FRT-seq addresses many problems of current transcriptome-sequencing protocols — namely, the biases introduced from PCR amplification, and the strand switching that can occur because all the steps are performed in one test tube. In the current method, the authors prevent strand switching by keeping the strands separate on the flow cell.
"Based on what they described in the paper, it seems immediately useful," he said. "I think the technical quality is very high. It's very reproducible and very strand specific."
He agreed with Milos that the library preparation, and in particular, the RNA ligation step, could be susceptible to biases. He said it is likely that the RNA ligases more efficiently to certain sequences. Also, if any folding of the RNA occurs on the flow cell, that could inhibit reverse transcription of some of the RNA structures. "But, with any method there are biases," he said. "Until we can put the RNA in a machine and sequence it directly without doing anything, there will be biases. And even then, there may be biases."
Overall, however, Gravely said the method did a good job of addressing the major problems with the current protocols for sequencing RNA.
"It seems like a great protocol and we'll definitely be giving it a shot," he said.
Turner said that the next steps are to improve the library preparation steps, particularly the RNA fragmentation, and he will also experiment with different adapter compositions, although, he added, "even at this stage it's working pretty well." He wasn't sure about the exact cost of the method, but said it was likely comparable to Illumina's standard RNA sequencing protocol.
Turner said that he and his colleagues began looking into developing a PCR-free method for sequencing RNA because they had been working a lot with malaria, which is composed primarily of the bases adenine and thymine. Sequencing the transcriptome using standard methods was tricky because PCR amplification is biased towards guanine and cytosine, so the AT-rich regions tended to drop out.
Aside from malaria, Turner said the method will be useful for sequencing a wide variety of transcriptomes. "It has the potential to be our standard RNA sequencing protocol," he said.