Researchers at Yale University have used Solexa sequencing to generate a high-resolution transcriptome map of the yeast genome, allowing them to detect novel details of how the genome is expressed — a feat that microarray-based methods would be unable to reveal.
Sequencing fragmented cDNAs on Illumina’s Genome Analyzer and mapping the reads to the yeast genome, the scientists identified untranslated regions, introns, alternative initiation codons, upstream open reading frames, unexpected 3’ ends, and overlapping genes.
The results, which appeared online in Science last week, indicate that “the yeast transcriptome is more complex than previously appreciated,” according to the study, and pave the way for similar analyses of the human genome.
Overall, the sequencing-based method, though more expensive, performed “much better than microarrays,” senior study author Mike Snyder told In Sequence last week.
For example, the scientists did not experience any background noise — none of the sequence reads mapped to a region of the genome that was deleted in their experimental strain. With microarrays, on the other hand, “the biggest problem is cross-hybridization with the probes on the array,” said Snyder, who is a professor of molecular, cellular, and developmental biology at Yale.
Snyder said that he and his colleagues found transcriptome sequencing to be more sensitive and provide a higher dynamic range than microarrays, as well as to be quantitative.
For example, the researchers observed an 8,000-fold dynamic range, whereas similar experiments with microarrays only give a 60-fold dynamic range, Snyder said.
Especially at the high end of gene expression, where microarrays tend to get saturated, and at the low end, where cross-hybridization interferes with sensitivity, the sequencing-based results matched better with qPCR experiments than results from microarrays.
Bias was not much of a problem in this study, he said, probably because the transcribed sequences have a more similar GC composition than the genome as a whole. “We have noticed a bias when doing genome sequencing,” he said, but “it’s not as big an issue in transcription sequencing than it is in genome sequencing.”
In addition, sequencing allowed the researchers to precisely determine the boundaries of, for example, untranslated regions, introns, or 3’ ends of genes, whereas microarrays only yield approximate boundaries.
“The data quality is better” than from microarrays.
For example, the researchers found that about a third of the expressed yeast genes overlap at their 3’ ends. “You can never sort that out with a microarray, at least using double-stranded cDNA for probing microarrays, which is what most people do,” Snyder said.
Though he did expect to find out novel aspects about the yeast transcriptome in this study, the extent of overlap at the 3’ ends of genes surprised him. “I really wasn’t expecting that,” he said.
At the time the study was performed, it cost about six times more than a microarray experiment, Snyder estimated, “but on the other hand, I would say the data quality is better.” Today, because of technical improvements of the Illumina platform and price cuts, the same study would cost approximately three times as much as a microarray experiment, and “with the way sequencing is going, I would expect [the price] to drop even further,” he said.
For gene-expression studies, he said, “sequencing-based methods will probably take over in the future,” whereas microarrays will have new niches, for example for selecting regions of the genome prior to sequencing.
As part of their current study, the scientists generated almost 30 million 35-base sequence reads on Illumina’s Genome Analyzer, of which only about 56 percent mapped to unique regions in the yeast genome.
“I am not sure it’s the best way to do it, but it’s currently the way everybody does this,” Snyder said. “If [reads] match to more than one place in the genome, they are usually just put aside.”
In the future, Snyder and his colleagues plan to sequence more transcriptomes. “We are certainly applying it to humans, as are other groups,” he said.
Sequencing cDNAs with short reads from technologies like the Illumina GA or ABI’s SOLiD, which Snyder has not evaluated yet, are well suited to determine which exons are expressed, where their boundaries are, and whether splicing occurs, he said.
The reads are too short, though, to determine what combinations of multiple exons are used. “I think as you get into more complex genomes, which have more exons and more introns per gene, longer reads would be advantageous for maintaining connectivity of the exons,” he said.
Technologies with longer reads, like 454’s, might be more appropriate for these studies, but are more expensive and less useful for tag sequencing, he said. “It might be hard to get the same depth, but you get better connectivity.”
This is not the first study to use Illumina’s GA to sequence cDNAs on a genome-wide scale: two weeks ago, scientists from Joe Ecker’s group at the Salk Institute published a study in which they sequenced the transcriptome of several Arabidopsis mutants (see In Sequence 4/29/2008).
Other research groups have used 454’s technology to analyze transcriptomes. For example, in February, a team lead by Brigham and Women’s Hospital in Boston published a study in which they analyzed both gene expression levels and SNPs in lung cancer samples, using 454’s GS 20 (see In Sequence 2/26/2008).