Researchers from the Genome Sciences Centre at the British Columbia Cancer Agency have added a new method, Tag-seq, to the rapidly growing list of applications for next-generation sequencers that already includes genome sequencing, methylation analysis, RNA-seq, and ChIP-seq.
The approach, published online in Genome Research earlier this month, modifies protocols that are used for long serial analysis of gene expression, or LongSAGE, for the Illumina Genome Analyzer and can generate two orders of magnitude more data than Sanger-based LongSAGE for considerably less cost, according to its developers.
LongSAGE is a digital gene expression analysis method that uses 21-base-pair tags to identify genes and exons. Typically, tags from individual transcripts are ligated to form ditags that are then concatenated, cloned, and sequenced using capillary sequencing. Tag-seq, on the other hand, does not require ditag production and concatenation, and relies on the Illumina GA to sequence the tags.
The authors report in the Genome Research paper that a Tag-seq library is typically sequenced to a depth of 10 million tags, "which represents an increase of two orders of magnitude over the sequencing depth of a typical LongSAGE library." In addition, Sorana Morrissy, the first author on the paper, told In Sequence that the method also offers a two-order of magnitude improvement in cost compared to LongSAGE.
A Tag-seq library of 10 million tags costs several thousand dollars to create, she said, while a similarly sized library for LongSAGE would have cost $600,000, "which is completely cost prohibitive" for analyzing large numbers of samples.
Tag-seq also offers performance advantages over LongSAGE, she said. "Because we're able to sample so deeply, we actually find that there is a distinct class of transcripts that we were previously unable to detect in LongSAGE libraries." These are enriched in antisense transcripts and in transcripts for transcription factors, which tend to be expressed at very low levels so it has thus far been difficult to measure differential expression between different tissue types.
"Now, because we have lots of evidence for expression of these transcription factors, we can look, say, between cancer and normal, and we can say, 'Aha, this transcription factor is much more highly expressed or differentially regulated.'"
Morrissy said that Tag-seq is complementary to RNA-seq, which has "slightly different strengths." In particular, "it's able to reveal transcript structure … whereas Tag-seq is better at profiling exact expression."
Also unlike RNA-seq, Tag-seq is strand-specific. "The Illumina reads themselves don't have any strand information, but the Tag-seq method reintroduces that information when we make the libraries," she said. "So we're able to tell if a transcript is being transcribed from the sense or the antisense strand."
This provides a "major advantage" for cancer research over RNA-seq because it enables sense/antisense gene expression. "That's important to cancer research … because sense/antisense transcripts are quite prevalent in the genome, and novel antisense transcription has been detected for up to 75 percent of genes, and antisense transcripts have been shown to be implicated in disease processes," she said.
The BC team tested the method on samples from the Cancer Genome Anatomy Project, which has already used Sanger-based LongSAGE to measure gene expression profiles for a range of cancer cells and tissues. The researchers generated 35 tag-seq libraries from cancer and normal tissue samples and compared these to 77 LongSAGE libraries.
Among a number of findings, they determined that Tag-seq is "well suited to the study of cancer-relevant gene expression in the context of the CGAP project," and found a number of known and novel sense-antisense gene pairs "for which the ratio of expression changed significantly between cancer subtypes or between cancer and normal states. These were enriched in known cancer-related genes, supporting a role for antisense transcription in cancer biology."
[ pagebreak ]
Morrissy said that the BC team is currently looking at using Tag-seq and RNA-seq together, but it is too early to report results from that effort. "RNA-seq can give you transcript structure, and Tag-seq gives you strand-specific, accurate expression profiling, so really you could do both on the same sample and gain a lot more information."
For example, she said, "in RNA-seq you might have a sense-antisense gene pair [where] you don't necessarily know which strand is being expressed, so with Tag-seq, we're hoping to resolve that ambiguity."
The BC researchers also found that Tag-seq compared favorably to microarrays because there is no risk of cross-hybridization and the method is not dependent on probe design and can therefore detect novel genes. In addition, Morrissy said that the dynamic range of Tag-seq — its ability to detect genes with very low and very high expression levels — is much better than that of arrays.
"You can distinguish a gene that is very infrequently expressed from one that is very highly expressed with much better resolution than on an array, because on an array … there are X number of probes, and once they're all bound, you don't get an increase in signal. Your signal has basically flattened out at that point," she said. "Whereas with Tag-seq, the only limit to the dynamic range is the number of sequences you're generating. If you just sequence further, you'll find genes that are less frequently expressed."
In the paper, the authors note that the dynamic range of Tag-seq was 13-fold better than that of Affymetrix arrays.
The BC paper follows a study published in BMC Genomics last year by researchers at France's Université Claude Bernard Lyon in which they created and compared two tag libraries from male adult mice hypothalamus — one constructed by Sanger-based LongSAGE and another with Illumina GA-based LongSAGE.
The authors concluded that Illumina sequencing "is well adapted to the sequencing of LongSAGE tags," and that the combination of LongSAGE and Illumina sequencing "is therefore perfectly suited for deep transcriptome analysis."
The BC authors note in their paper that their work "extends" these findings "by reporting for the first time that with increasing depth, Tag-seq also allowed detection of a distinct subset of transcriptome space, enriched in AT-rich genes, intronic tags, antisense tags, and novel intergenic tags."
In addition, Morrissy noted that the BC paper is the first "robust" comparison of LongSAGE and Tag-Seq, since the prior paper only compared two libraries while the BC researchers compared 35 Tag-seq libraries and 77 LongSAGE libraries.
Unlike many new applications for next-gen sequencing technology, Tag-seq does not require the development of new software tools to analyze the data, according to Morissy.
"This data can be analyzed in the same way that LongSAGE data has been analyzed," she said. "There are lots of tools out there for this kind of analysis, so nothing would have to be developed in that department."
In addition, "anyone with an Illumina sequencer would be able to use it," Morrissy said.
While the BC team developed the Tag-seq method for the Illumina GA, the concept could work with other next-generation sequencing platforms with some redevelopment, though Morrissy stressed that technical comparisons with Tag-seq, RNA-seq, and microarrays would be necessary.
The application does offer a rare instance in which short-read technologies, including the Illumina GA and Applied Biosystems' SOLiD, have an advantage over longer-read platforms, though. Unlike applications that require assembly, where longer reads are preferable, Tag-seq only requires 21-base-pair reads, so the Illumina system is more than sufficient without any further improvements, Morrissy said.