Academic researchers and sequencing platform vendors have independently been evaluating how new high-throughput sequencers perform in gene-expression studies, and have been using standardized samples to compare them to microarray-based platforms.
Following in the footsteps of the MicroArray Quality Control project, Rick Jensen’s group at the Virginia Bioinformatics Institute at Virginia Tech, in collaboration with Roche/454, has used two standardized MAQC samples to assess how the 454 platform performs in gene-expression studies.
Meanwhile, Illumina and Applied Biosystems, using the same samples, have evaluated their own systems in house.
Each of the groups said they would be interested in participating in a formal MAQC-like comparison of their platforms.
The original MAQC project, led by scientists from the US Food and Drug Administration, measured gene-expression levels in two standardized RNA samples on seven microarray platforms and three alternative expression platforms at three independent test sites.
The first phase of the MAQC study, which involved participants from 51 organizations, was published in Nature Biotechnology in 2006. The second phase of the study, which is focusing on microarray-based predictive models, is still ongoing.
Though the MAQC study focused primarily on microarray-based platforms, the same methodology can be used to evaluate any gene-expression technology, including those based on DNA sequencers, according to Jensen, who is a professor of biological sciences at Virginia Tech.
He said that because the new sequencing platforms directly count the numbers of copies of each type of RNA molecule in a sample, rather than measure hybridization, they may become the “gold standard” in gene expression analysis in the future.
However, “in order to answer the questions of whether [they] really [are] more sensitive, more specific, more reproducible, and more accurate [than arrays], the same tests should be performed on the transcriptome sequencing methodologies that were performed for the microarrays,” he told In Sequence last week.
Jensen’s group has been assessing 454’s GS FLX platform for about a year by using the MAQC study’s Universal Human Reference RNA, or Sample A, from Stratagene, a subsidiary of Agilent Technologies, and the Human Brain Reference RNA, or Sample B, from Ambion, a subsidiary of ABI. Both samples are commercially available.
“They were specifically chosen to be as different as possible so that you literally have [at least] 10,000 genes that are differentially expressed, with both high and low abundance, with both large and small changes,” Jensen noted. “At this point, they are probably the best characterized RNA samples since they have been examined with more than a half dozen microarray platforms, as well as [by] extensive RT-PCR over 1,000 randomly chosen genes.”
Jensen and his colleagues are currently analyzing almost 2 million 454 reads for each of the A and B samples and have explored different ways of preparing the cDNA prior to sequencing. Roche’s 454 provided sequencing reagents, prepared some of the samples, and consulted on the project, according to a Roche spokesperson.
“Part of the reason that we did the project was to encourage the community, both the other vendors but also the people that originally set up the MAQC project, to do a more formal, more thorough investigation of all the platforms,” the spokesperson said.
In February, at this year’s Advances in Genome Biology and Technology meeting, Jensen presented an analysis of a preliminary, smaller dataset that showed the technology is “very promising” for gene-expression studies.
“You are getting excellent specificity” and “very good sensitivity” compared to microarrays and TaqMan assays, Jensen said. Sensitivity, he said, increases with the depth of sequencing.
“[Transcriptome sequencing] will be an attractive alternative to microarrays for discovery of gene expression differences.”
Also, the error model is “very simple” compared to microarrays, where error models “have been debated for years,” he added.
Finally, the accuracy of the technology, he said, is “comparable to the DNA microarrays.”
“The only obvious downside to the technology is that it is still very expensive compared to microarrays,” Jensen said. However, as the cost of sequencing declines “it will be an attractive alternative to microarrays for discovery of gene-expression differences.”
Another reason for this potential draw is that transcriptome-sequencing studies yield more information than just gene-expression levels. “You not only get gene-abundance information, like from traditional microarrays, but you also get splice-variant information [and SNP information],” Jensen said.
“In the microarray world, you would run an expression array to get abundance, you would run an exon array in order to get splice variant information, and you run SNP arrays to get more global SNP information about the sample,” he said. “You get all this at once in sequencing.”
According to Jensen, the long reads of the 454 technology are an advantage over short-read technologies such as Illumina’s or ABI’s because they are easier to map and assemble into genes. However, he said, the trade-off is the cost of generating the data, which is higher for 454.
Jensen, who was part of the original MAQC study, said he would welcome a cross-platform comparison of several sequencing systems using MAQC’s A and B samples, though he said he does not know if anyone is planning such a study. “I would personally be very interested in getting everybody together and comparing data,” he said.
Assessing the sequencing platforms might also reveal their biases. “Right now, we are just learning what the different biases are, and the different technologies — Solexa, ABI, 454, Helicos, and anybody else coming online as well — may separately introduce different biases. We won’t know them unless we assay them using the standard samples,” Jensen said.
For instance, the way mRNA is converted into cDNA may “skew the quantitative interpretation of the data," he said.
Both Illumina and ABI told In Sequence that they would be glad to participate in a formal study comparing their sequencing platforms for gene-expression applications. In fact, the MAQC consortium is considering such a study.
“There have been serious discussions about including [a] performance assessment of sequencing-based technologies under the MAQC umbrella, starting with the two reference RNA samples, A and B, used in the MAQC Phase I study in a similar experimental setting,” Leming Shi, a researcher in the division of systems toxicology at the FDA’s National Center for Toxicological Research and MAQC leader, told In Sequence by e-mail this week.
In the meantime, Illumina and ABI have been conducting their own internal studies using the A and B samples.
“As original participants in the MAQC study, we recognize the value of those samples,” Shawn Baker, senior product manager for gene expression at Illumina, told In Sequence this week. “We are using them for development purposes internally.”
For example, the company has been using the samples to help it develop a full-length cDNA sequencing assay, called mRNA-Seq, which quantifies transcripts by counting the corresponding reads.
“What we have shown is that sequencing-based quantification of gene expression is more sensitive, and has a larger dynamic range, than any of the microarrays that are out there,” said Gary Schroth, Illumina’s senior director for expression application R&D.
While microarrays have a dynamic range of no more than 3.5 orders of magnitude, he said, sequencing-based technologies cover a range of 4, 5, “or maybe even more” orders of magnitude, “depending on how much you sequence.”
Baker said Illumina currently markets its protocols for mRNA-Seq and this summer plans to launch a commercial kit for the application that will include analysis software.
Like it did in the earlier MAQC study, Illumina would participate in a formal study to compare sequencing platforms, Baker said. He said the major benefit of such a study is “getting this kind of data into the hands of people who have been using microarrays for a long time. [There are] so many more things you are seeing than with any microarray platform out there,” such as alternative splicing.
At ABI, too, researchers have used the MAQC samples to compare results from the SOLiD system directly with microarrays. Preliminary results “look very promising” said Roland Wicki, ABI’s director of SOLiD gene expression strategy.
Both sensitivity and dynamic range are “much higher” than with microarray-based systems, he said. “Depending on how we do gene expression, [we achieve] a dynamic range of 5 to 6 orders of magnitude,” he said, which is about 1,000-fold higher than with microarrays.
ABI already offers protocols for SAGE-type gene-expression analysis on the SOLiD, and has started an early-access program for a small RNA-sequencing kit.
The company is also working on a kit for whole-transcript sequencing where users will be able to choose to sequence either mRNA or non-coding RNA, depending on the sample prep.
If the FDA or an academic institution was to organize a cross-platform comparison, ABI would be interested in participating, Wicki said. He cautioned, though, that such a study “has to be thought about very thoroughly because there is so much more you can do on next-generation sequencers [than on microarrays],” for example study alternative splice forms.
Helicos Biosciences did not respond before deadline to a request about whether it is also evaluating the HeliScope platform using the MAQC samples.