Data from a 2007 study in the Netherlands comparing tag-based sequencing on Illumina’s Genome Analyzer with five different microarray platforms for gene-expression analysis has found that the sequencing tool provided more robust, comparable, and richer data than any of the array platforms.
The scientists, led by the Leiden Genome Technology Center, published the results of their study online in Nucleic Acids Research this month. In future experiments, they plan to replace microarrays with sequencing.
The Dutch scientists are not the only ones evaluating the ability of a second-generation sequencing platform to perform gene-expression analysis. Rick Jensen at Virginia Tech and researchers at Applied Biosystems, for example, have independently comleted similar comparative studies using the Microarray Quality Control, or MAQC, reference RNA samples with the Roche/454 GS FLX and the ABI SOLiD platform, respectively (see In Sequence 4/22/2008).
“All of these next-gen transcriptome studies point to advantages [of next-gen sequencing] for gene expression analysis in sensitivity, specificity, and the detection of novel transcription variants,” Jensen told In Sequence by e-mail this week. “However, the less costly hybridization technologies will continue to play an important role in assays of large numbers of samples."
The Dutch study was one of the first the LGTC researchers conducted after receiving their first Illumina Genome Analyzer — the center now owns two instruments — in early 2007. “Of course the first question people ask is, ‘how does it compare to existing methods?’” said Peter-Bram ‘t Hoen, an assistant professor at the Leiden University Medical Center.
He and his colleagues used tag-based transcriptome sequencing, also known as digital gene expression tag profiling, on the Illumina sequencer to generate gene-expression profiles for several samples from the hippocampus of wild-type and transgenic mice that overexpress a splice variant of a kinase. They analyzed four samples for each type of mouse, generating approximately 2.4 million sequence tags per sample.
Previously, they had analyzed the exact same RNA samples on five different genome-wide microarray gene expression platforms. That study, published earlier this year in BMC Genomics, detected few differences in expression between the two groups of samples.
The array platforms they compared the Genome Analyzer data with were the Applied Biosystems ABI 1700, the Affymetrix Mouse Genome 430 v2.0 array, the Agilent-WMG G4122A, the Illumina Sentrix Mouse-6 Expression BeadChip, and a home-spotted 65-mer oligonucleotide array.
The researchers found “many more” differentially expressed genes by sequencing than using arrays. One of the reasons is that sequencing can pick up more low-abundance transcripts, according to ‘t Hoen. Also, microarrays have trouble recording differences between low-intensity transcripts because of the fluorescent background sample. “This background is essentially absent when you sequence … and that gives you more power in the low-intensity range,” he explained.
“The real bottleneck for many of the sequencing applications is really data analysis at the moment.”
Also, fold-changes in expression between genes from the two types of samples were greater with sequencing than with arrays due to “the well-known effect that microarrays tend to compress the ratios,” he said.
Some of the results could not have been obtained with microarrays. For example, the scientists found that about half of all genes were transcribed from the antisense strand. Also, they discovered alternative polyadenylation sites for almost half of all genes.
Overall, there was not a lot of overlap between differentially expressed genes discovered by sequencing and by the five array platforms, with the “most significant” overlap occuring with the Affymetrix platform.
One reason for that lack of correlation was the existence of different transcript variants that are expressed at different levels. While sequencing recording these variants separately, arrays lumped these variants together in one measurement.
Also, the nature of the samples the scientists chose to study was difficult because they were known to harbor only subtle differences in gene expression.
That is probably also why the correlation between sequencing and qPCR, which the researchers conducted for 62 genes, was only modest.
“Some of the changes that we wanted to confirm are very small changes, like 1.3-fold, which are difficult to detect with quantitative PCR,” ‘t Hoen said. Also, antisense transcripts might have interfered with the qPCR assays, he added.
One noteworthy finding was that the sequencing results were very similar to data the scientists had obtained from Illumina, which sequenced the same samples for them before the center decided to acquire a Genome Analyzer.
This consistency was in contrast to microarray experiments. “For a microarray, our experience is, in that respect, very bad,” ‘t Hoen said. “It’s very difficult to get consistent results across laboratories.”
Being able to compare results between labs will facilitate comparisons between datasets from different labs, he said.
Sequencing-based gene expression analyses could still be improved by moving from 3’-tag sequencing to mRNA sequencing, or transcriptome sequencing, allowing researchers to identify alternative splicing along the entire transcript. Several groups have already performed such studies (see, for example, In Sequence 5/6/2008).
However, “you need to sequence a lot more when you want to measure the whole transcriptome, compared to when you only measure 3’ tags,” ‘t Hoen said. “With the current sequencing power that we have, that’s still a limitation” because the “cost for sequencing will get considerable.”
Right now, the Leiden Genome Technology Center charges approximately two to three times as much per sample for tag-based sequencing on the Illumina GA as for a comparable microarray experiment. “But for that, you get a lot more data, and a lot more precise data,” ‘t Hoen said. He and his colleagues have also shown that because of the lower technical variability between sequencing experiments, fewer technical replicates are required than for array-based experiments, lowering the cost of a study.
Besides cost, there are no disadvantages to sequencing-based gene expression analysis, according to ‘t Hoen, but “when it comes to data analysis, the tools for microarrays are much further developed,” he said, adding that “the real bottleneck for many of the sequencing applications is really data analysis at the moment.”
Though tools are available for many applications, “they are still command-line based tools, so they are not very user-friendly,” he said. “For the general user, who doesn’t have a lot of access to bioinformaticians and biostatisticians in the group, that may be an impeding factor to starting a sequencing experiment.”
Also, data storage capacity is currently “more of a limiting factor than the sequencing machine” at the Leiden center, he said. Customers need to bring a portable hard disk, he said, because data from one experiment can easily reach a terabyte. “Those are thing that can be solved, but it’s more difficult than for microarrays at the moment.”
According to ‘t Hoen, the LGTC is currently switching to the Illumina platform for sequencing-based gene expression studies. “For ongoing experiments, we tend to still use microarrays, because it’s too difficult to compare data from sequencing experiments directly to data from microarray experiments,” he said. “But for new experiments, we use sequencing.”