By Julia Karow
Researchers at Stanford University have developed a sequencing-based method for quantifying gene expression at a genome-wide level that works well in both formalin-fixed paraffin-embedded and frozen tumor samples, thus opening up large tumor archives for gene expression studies.
In a proof-of-concept study published last week in PLoS One, the method, called 3'-end sequencing for expression quantification, or 3SEQ, performed much better than microarrays in profiling gene expression in FFPE samples of two soft tissue tumor types.
The vast majority of archived tumor samples are FFPE samples, but the RNA in these is usually fragmented and degraded. For that reason, microarrays have "never worked well at all" for profiling gene expression in these samples, according to Arend Sidow, an associate professor of pathology and genetics at the Stanford University Medical Center and a senior author of the study.
According to the paper, other methods — including RT-PCR and cDNA-mediated annealing, selection, extension, and ligation — are able to profile the expression of limited numbers of genes in FFPE samples, but none of them works at a genome-wide level.
Other than their size, FFPE tumor archives are also a "treasure trove" because they tend to go back many years and have clinical records associated with them, Sidow explained. "We have, literally, thousands of FFPE sample just in the archives at Stanford" that could now be analyzed, he said.
3SEQ works well in these kinds of samples, he and his colleagues found. The method targets the 3' end of mRNA and generates directional sequencing cDNA libraries in which each fragment contains a portion of the poly-A tail followed by about 200 base pairs of upstream sequence.
The main difference between standard RNA-seq and 3SEQ is that the former targets the entire length of each transcript and therefore requires high-quality starting RNA, whereas 3SEQ is designed to produce only a single read per transcript — regardless of how long it is — and therefore does not suffer from length bias.
Sidow said that another advantage of 3SEQ, which is not stressed much in the paper, is its ability to quantify transcripts that have not yet been annotated in the genome — a capability that would also apply to studies of intact RNA.
"You don't need [to know] your gene to quantify [it], which is an absolute necessity for standard RNA-seq [where you] need to aggregate the reads across the annotated transcript in order to quantify it," he said. "Here, what you can do is map against the genome and find peaks of signals." This application of 3SEQ, he added, "is currently in the works, and we think that it will lead to much better quantification and discovery."
The fact that 3SEQ restricts the analysis to a small portion of each transcript is also the main limitation of the method. For example, it is not very useful for identifying splice isoforms or discovering polymorphisms, Sidow said.
He and his colleagues are currently working on improving analytical methods for 3SEQ data, which he believes will "leverage the signal much, much better."
In their paper, the Stanford researchers compared 3SEQ to oligonucleotide gene expression microarrays on a total of 23 frozen or FFPE samples of two subtypes of fibroblastic soft tissue tumors. These cancers, which consist mostly of tumor cells with few contaminating normal cells, are morphologically similar but have distinct clinical features and have been shown in the past to have distinct gene expression patterns.
[ pagebreak ]
3SEQ found that about 8,100 genes in the two tumor subtypes were differentially expressed in FFPE tissues, and about 9,600 in frozen tissues, whereas the microarray data only found 69 differentially expressed genes in FFPE tissues, and 4,600 in frozen tissues.
An analysis of the 3SEQ results identified a number of biological pathways involved in tumor formation that the array data did not detect.
The scientists sequenced their 3SEQ libraries using an Illumina Genome Analyzer II with 36-base reads, though Sidow said that they are now also using the Applied Biosystems SOLiD.
In principle, the 3SEQ method could be automated because there is a bead-based version of the protocol that is gel-free, he said, though they have not yet explored that.
In the meantime, he and his colleagues have been applying their method to larger studies involving several hundred samples in different cancer types.
"It is a very good study and an important step forward towards the use of clinically relevant tissue material," Michal Schweiger, a researcher at the Max Planck Institute for Molecular Genetics in Berlin, told In Sequence by e-mail. Last year, she and her colleagues published a study in which they sequenced genomic DNA from FFPE tumor samples using Illumina's Genome Analyzer and detected several types of variation (see In Sequence 6/2/2009).
"It seems that the [next-generation sequencing] approach can be very well applied to stored FFPE material and that the sensitivity increases significantly" compared to arrays, Schweiger said. One drawback, she said, is that the 3' end sequencing approach generates a 3' bias, and splice variants cannot be analyzed.
3SEQ is also not the first method to sequence short stretches of polyadenylated RNA near the 3' end in order to quantify gene expression: Last year, for example, a group at the British Columbia Cancer Agency published a similar method, called Tag-seq, that is also directional and generates sequence reads next to the 3'-most cleavage site of a certain restriction enzyme (see In Sequence 6/30/2009).
"While 3SEQ and Tag-seq reads are equivalent in many respects, one aspect of 3SEQ that stands out is the potential for increased read lengths made possible by advances in next-generation sequencing technology," said Sorana Morrissy, a researcher at the Genome Sciences Centre at the BCCA and the first author on the Tag-seq paper. Tag-seq generates 21-base pair reads, and 3SEQ could be further improved by using longer reads.
Morrissy told In Sequence by e-mail that so far, she and her colleagues have only used Tag-seq to analyze libraries from fresh frozen tissue samples. But she would expect Tag-seq to work with degraded RNA samples as well, since the vast majority of transcripts have the required restriction enzyme site within 400 base pairs of the poly-A tail.
"Applying these methods to the existing collections of archival tumor samples should generate a wealth of valuable gene expression data from which to draw insights into cancer biology," she said.