In the burgeoning field of transcriptomics, it's no surprise that those who maintain an if-it-ain't-broke-don't-fix-it mentality are being outperformed by investigators willing to embrace the power — and challenges — of nascent, albeit imperfect, technologies. As the rift between the cost of running gene-expression microarrays and RNA-sequencing continues to diminish, some researchers are ditching array-based methods in favor of technology that promises to offer more potential for data analysis.
But more isn't always better, some experts are quick to point out. The increased information RNA-seq provides is matched by an increased informatics workload, which is enough to put some labs over the edge — and back on the phone with their array distributors. Still, though microarrays might currently be the go-to technology platform for expression studies of targeted genes, some expect that RNA-seq will eventually phase out microarrays for transcriptomics studies. The question then becomes just how soon?
"The bottom line is that RNA-seq is a digital technology. And because of that, the readout's a lot cleaner [because] you get the actual sequence, as opposed to some indirect readout that you get from hybridization," says Yale University's Mark Gerstein. With RNA-seq, "you get a much clearer sense of where things are transcribed … and you can do a lot of additional things that you couldn't even think about with microarrays."
More, more, more
In addition to increased sensitivity, RNA-seq data allows researchers to interrogate allele-specific expression, determine the structures of splice isoforms, garner mutation information, identify novel transcripts, examine gene fusion events, and better quantify gene expression levels, among other things. Even the most sophisticated arrays can't compete.
Keith Robison, a senior scientist at the Boston-based drug development firm Infinity Pharmaceuticals, says, "RNA-seq is getting closer and closer to beating microarrays on every performance criterion."
Shirley Liu, an associate professor at the Dana-Farber Cancer Institute, says that "in terms of data quality, there is no question — RNA-seq is better than microarrays." Because of that, there has been a shift in the field, away from arrays and toward RNA-seq.
While using microarrays, "you can have a number of different isoforms that are simultaneously compatible with a given set of probes, so it's hard to discombobulate exactly how many of a particular isoform you have and exactly what the splice structure is," Gerstein says. "Whereas with RNA-seq you get the actual sequence, so if you have two isoforms that differ in a few bases being spliced differently, in practice, you can read that out."
In a June 2008 Genome Research paper, John Marioni and his colleagues compared sequencing with array-based gene-expression profiling experiments and concluded that sequencing was a "promising technology" and was "in some ways superior to existing array-based approaches."
Far from perfect
Cost and access, however, remain roadblocks for many labs interested in adopting the technique.
RNA-seq is, in theory, the obvious choice for transcriptomics studies — more data means more possibilities for analysis. In practice, however, it's not always the clear winner. Library construction, the add-on costs of RNA sample prep and sequencing reagents, and the sequencer itself all add up to a pricey production. Arnoud van Vliet from the Institute of Food Research in Norwich, UK, says that, in his experience, RNA-seq is more expensive than arrays when calculated per sample. "For the price of one RNA-seq experiment, it is possible to do 20 to 100 microarray samples," he adds.
Array-based methods still rival sequencing in terms of throughput, van Vliet adds.
"Typically, you can process a large number of samples relatively easily and in a short time," says Stanford University's Wing Wong. "This is something that sequencers cannot do."
Because RNA-seq is still a relatively new technique, it also faces methodological challenges. For example, Gerstein says, the accuracy of transcript reconstruction and expression level quantification could be significantly improved. According to Wong, RNA-seq protocols are subject to a variety of biases — products of anything from the amount of starting material used, to the implementation of poly-A tail tags, to where reads are physically sampled from the genome. "The sequence itself would cause bias if some spots [in the genome] are more likely to be sampled," he says.
Still, Infinity's Robison says that RNA-seq will eventually replace arrays. Right now, he says, "a lot of projects [are] already fired up, and they're going to keep going forward with array-based technologies [because] they already have the infrastructure and they already have their workflows going."
[ pagebreak ]
Dana-Farber's Liu says that researchers aren't ready to abandon arrays because they're familiar with the technology. "With gene-expression microarrays," she says, "people know exactly what they're going to get. The tools are very, very mature." Consequently, "if a smaller lab has a lot of samples to process, I think arrays [are] probably still the best way to go," she adds. "This is especially true if they don't have a sequencing facility and an informatics collaborator."
Too much information?
"RNA-seq [data is] both richer and substantially more complicated from an informatics processing standpoint," Yale's Gerstein says. "It's a much more involved processing task to get [useful information] out than just simply reading off the reds and greens from a microarray."
Most labs — even those funded adequately enough to send out their samples to a sequencing core or to run their own sequencing experiments — are not yet equipped to handle RNA-seq data, Wong says. "A typical biology lab can easily submit 10 samples to the sequencing core and get 100 million reads from each sample. Then suddenly you have a billion reads, and each read is 200 base pairs. That's a lot of data. … And just the mapping and computation, most labs don't have the informatic and computational expertise to handle that," he says. "Even with software, they would need a strong computational person just to use [it] and a fast server to run it in reasonable time." On top of that, Wong says, data storage capacity is an issue for most institutions.
Analytical and storage issues aside, Infinity's Robison says that when it comes to RNA-seq data, the more, the better. "I guess some people worry they're going to [become] overwhelmed, but I always find it odd to say, 'I want less.' I never want less information," Robison says. "I can always dig through it later."
To that end, academic labs have been pumping informatics tools for RNA-seq data sets into the literature throughout the past year. Most recently, two papers published online in Nature Biotechnology in May report tools for RNA-seq analysis — Cufflinks and Scripture — that employ alignment and assembly algorithms for the short reads in a "splice-aware" manner. Gerstein and his colleagues developed Fusion-Seq to identify gene fusions in sequencing data sets and Wong has published his SpliceMap method, which allows researchers to detect splicing events with improved sensitivity and specificity.
Even equipped with several software tools, labs have to perform a certain amount of scripting and optimization of their own to fit them to their data, Dana-Farber's Liu says.
And when developing analytical tools, Wong says, "the difficult trade-off has always been how much computation you're willing to spend to get an increase of a small amount of sensitivity and specificity. … If you're willing to do a lot of computation, you may enhance things a little bit, but right now the amount of data is so huge that sometimes you cannot afford to do that."
Gerstein says that he's already seen researchers performing RNA-seq for straightforward gene-expression profiling experiments; they're not interested in interpreting the "full richness of the data," like splicing complexities, he says, but "people are generating these RNA-seq data sets that are quite huge ... and all they really want is just the gene expression levels."
According to Robison, some labs are "going to want all that complexity shaved away. … A lot of people are going to want — just up front — a tool that quickly changes that monstrosity into 'which genes are going up and down in these experiments?'"
Researchers still haven't agreed on the number — and types — of replicates required to validate RNA-seq experiments. Liu says that there isn't a universally accepted standard — "some of the earlier published papers didn't even have duplicates. … [Scientists were] just profiling different tissues" and reporting their results.
"With arrays you may see differences in the literature on whether you still need technical replicates," Robison says. He maintains that with either method, investigators "really need biological replicates" at the very least.
Liu says that when performing ChIP-chip, researchers are expected to perform triplicates, but when using ChIP-seq, duplicates are the publication requisite.
History repeats itself
Much like ChIP-seq has superseded ChIP-chip for histone marker and transcription factor detection, RNA-seq has the potential to replace array-based methods. In 2007, Liu says, some of her collaborators — staunch supporters of ChIP-chip — were unsure whether ChIP-seq experiments were providing more or better information. Eventually, she says, many researchers came to the conclusion that ChIP-seq was the superior technique.
Robison says the ChIP-chip/ChIP-seq transition snowballed among scientists, and he expects the migration from microarrays to RNA-seq to occur similarly. "It happened a year or so ago with ChIP-seq," he says, recalling a SeqAnswers thread in which users were commenting that "reviewers were starting to more or less reject — or seriously question — papers that relied on arrays for chromatin immunoprecipitation, because that's an area where arrays are inferior to sequencing. … That mind-switch had happened that you're doing this by the wrong method, and this is no longer acceptable."
"Expression profiling clearly has not made that switch, but I think people are starting to think it's not so distant in the future that the array market will just collapse," Robison says.
The IFR's van Vliet is not so sure. He suspects that much like the electron microscope didn't replace the light microscope, RNA-seq and microarrays might best be used in tandem. Although the electron microscope "offers immensely superior resolution, often one doesn't need that resolution and just wants a quick look," he says. "In my research, we are using RNA-seq for the highest possible resolution of relatively few samples, and we use microarray technology for [the] analysis of many more samples. Jointly they are really -informative."
Perhaps the most telling example of this switch is Stanford's Wong. Though he was considered a pioneer in using arrays for early transcriptomics studies, thanks to publications in 2000, Wong has all but converted to using RNA-seq since 2008.
Not if, but when
"I think the days of microarrays are pretty much almost over," Yale's Gerstein says. "Within a very short amount of time, I think that all gene expression experiments will be RNA-seq. … I've not seen any application where someone could say, 'I'd rather do this with a microarray than sequencing.'"
Infinity's Robison says that it's not a case of whether RNA-seq will phase out arrays, but how soon. "In general it's just a question of when micro-arrays are going to lose most of their business," he says, adding if "I were an array equipment salesman, I'd be kind of depressed at the idea of trying to sell new scanners, new platforms, get people to start up a project using chips when sequencing's just getting so cheap." Robison expects that as third-generation sequencers come online in late 2010, it will "be the beginning of the end for arrays."
Wong says that while sequencing costs decline — at whatever rate they may — the throughput microarrays are capable of will "ensure that the array will still be in use in the near future."
For his part, van Vliet expects that "relative ease and speed of microarray analysis" will keep the technique afloat for another five to 10 years.