Transcriptome sequencing, or RNA-Seq, using second-generation sequencers produced convincing results, according to recent studies published by several research groups.
Most recently, a team from the University of Chicago and Yale University used the Illumina GA sequencer to assess the method’s technical reproducibility, and to compare it with gene-expression microarrays.
Illumina and ABI have long claimed that “digital gene expression” analysis using their respective second-generation sequencing systems will soon become a major competitor to traditional gene expression microarrays.
In their study, Yoav Gilad, Matthew Stephens, and colleagues analyzed the technical variance associated with Illumina mRNA sequencing and compared its performance with results from Affymetrix arrays.
The findings, which appeared online this month in Genome Research, show that the sequencing data were highly reproducible and enabled the researchers to analyze splice variants, novel transcripts, and low-expressed genes — analyses the microarray data did not permit.
The study followed a recent article in Nature Methods by Barbara Wold’s group at Caltech as well as a number of other recent reports (see In Sequence 3/6/2008, 5/20/2008, 5/6/2008, 4/29/2008) that used Illumina’s Genome Analyzer or Applied Biosystems’ SOLiD for transcriptome sequencing .
Gilad’s laboratory in the department of human genetics at the University of Chicago has used microarrays to study gene expression in primates for several years.
“We were just looking at the sequencers as another option to get even better data to study the same type of questions,” he told In Sequence recently. “My particular interest … was to say, ‘How well is this [technology] performing compared to my microarrays?’”
To arrive at the results presented in their paper, the researchers purified mRNA taken from liver and kidney tissues biopsied from a single human subject. They then generated cDNA template libraries from the samples and sequenced them on Illumina’s Genome Analyzer.
“You are probably getting comparable results as well as the ability to perform additional analyses.”
To assess the technical variance within and between runs, researchers from the Yale side of the partnership sequenced each sample seven times, split between two machine runs. Each sample was sequenced at two different concentrations: five times at 3 pM and twice at 1.5 pM.
The team also analyzed the same RNA samples with Affymetrix U133 Plus 2 arrays, using three arrays, or technical replicates, per sample. Sample preparation and data analysis were designed “to be as similar to the sequence-based approach as possible,” according to the article.
Each lane of sequencing generated between about 13 million and 15 million reads for the higher sample concentration, and between 8 and 9 million reads for the lower concentration. About 40 percent of these reads mapped uniquely to a location in the genome.
One lane of Solexa sequencing, the researchers found, is approximately equivalent to one microarray, but delivers more information. “That’s probably the take-home message from our paper,” said Gilad. “By having the same number of lanes as previously you had microarrays you are probably getting comparable results, as well as the ability to perform additional analyses, such as alternative splicing, and perhaps looking at ... regions in the genome that were not known to be expressed.”
Gilad and his colleagues recorded little variation between results from the same sample sequenced in different lanes. “At least in our hands, as long as you sequence the samples at the same concentration, there is relatively little effect of the lanes, such that very few genes would be identified as differentially expressed spuriously,” he said. “We expected this [lane effect], actually, to be large. That’s why we replicated so much.”
However, that does not mean differences did not exist. “For about 0.5 percent of the genes, you actually do see a pretty sizable effect,” he said.
Because they focused much of their effort in this study on possible lane effects, the researchers did not replicate the sample preparation, another possible source of variation. They plan to study sample prep-related variation as part of future studies.
“I don’t expect it to contribute much, given my experience with microarrays [where] the sample preparation steps are pretty similar, but we will certainly now pay more attention to it,” Gilad said.
He would not say whether sequencing or microarrays are better at truly representing gene expression in a sample. “I think we can say that [sequencing] is at least as good,” he said, although “arrays have been performing brilliantly for many years.”
However, sequencing provides extra information that arrays don’t, according to Gilad. “The advantage here is that you are not limited by what probes are on the arrays. That’s why it’s so promising,” he said.
Array companies have maintained that they still have a cost advantage over RNA sequencing, but Gilad declined to provide a cost comparison. Prices change rapidly and vary between customers, companies, and array types, he said.
“It’s such a dynamic field, and such a dynamic pricing competition, I really want to stay out of that,” said Gilad.
Gilad plans to use transcriptome sequencing in upcoming biological studies. Up until now, he had to build a new custom microarray for each primate species he studied, which was more expensive than catalog arrays.
“So now, for us, sequencing is actually a good solution because it’s a comparable price, and it affords us to work with a species even if we don’t have a microarray for it,” he said.