By Monica Heger
Researchers have combined serial analysis of gene expression, or SAGE, with high-throughput sequencing to profile gene expression in both normal and cancer samples of human breast tissue. The method, published last week in Genome Research, is similar to RNA-seq but requires less total sequencing, although it also yields less data.
SAGE-seq combines an established method to analyze gene expression with next-gen sequencing. It is similar to RNA-seq, except that in one of the steps, the cDNA is cut with a restriction enzyme and a four base linker tag is added. Another restriction enzyme is then used to cut 21 base pairs following the initial enzyme, such that each SAGE-seq tag will begin with the same four bases, and be followed by a 17-base pair sequence that is unique to a particular transcript.
The aim of the method is to simplify the data analysis and reduce the amount of sequencing required for RNA-seq, but still be able to generate a comprehensive overview of gene expression.
"The difference between RNA-seq and this is that we're sequencing a defined position on the cDNA," said Kornelia Polyak, an associate professor of medicine at the Dana-Farber Cancer Institute and a senior author of the study. "Instead of sequencing the whole transcript, we're just sequencing one region."
In the Genome Research study, the researchers generated their libraries from 50,000 to 100,000 uncultured mammary epithelial cells isolated from breast tissue of seven healthy women and also from seven breast tumors. They sequenced the samples on the Illumina Genome Analyzer using a single-end sequencing strategy. For each sample they achieved between 130,000 and 650,000 unique tags, and between 1 million and around 13 million total tags.
The researchers used the method to profile both normal and breast cancer transcriptomes, and compared their results to traditional SAGE. They found that SAGE-seq detected 20 times more differentially expressed genes, including genes that are less abundant and ones that encode for known breast cancer-related transcription factors. Additionally, SAGE-seq identified three times as many pathways that are activated in breast cancer than traditional SAGE, such as the androgen receptor signaling pathway and the BRCA1-mediated pathway, both of which were missed by SAGE.
"SAGE-seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease," the authors wrote.
The method was able to detect lowly expressed genes, including transcription factors. It detected around 1,300 transcription factors out of around 1,658 total in the human genome.
The researchers also found that the total number of new transcripts detected plateaued at around 10 million reads, suggesting that 10 million reads per library is an ideal sequencing depth, with a minimum depth of five million reads per library, to gain a comprehensive overview of the transcriptome.
Although the authors did not compare the method directly to RNA-seq in this study, Polyak said that one main advantage of SAGE-seq is its easier computation.
Since all the transcripts will be the same size, researchers don't have to consider length, Polyak said. That also enables less total sequencing, which will reduce the cost and also enable researchers to use less starting material than RNA-seq.
For example, in this experiment, the team started with as few as 50,000 cells from a human tissue sample, while an RNA-seq experiment would require around five times as much starting material. Also, each sample was sequenced on one lane of the Illumina GA, whereas for RNA-seq experiments, researchers typically use multiple lanes per sample, said Polyak.
Polyak said that the team is now working on doing a head-to-head comparison of SAGE-seq and RNA-seq. The team is also using SAGE-seq to characterize breast cancer transcriptomes in different cells types, such as stem cells, and cells at various stages of differentiation, in both healthy and tumor breast tissue. "We're trying to identify differences in expression that correlate with differences in risk," she said.
While the technique may be a lower-cost, simpler method for measuring gene expression when compared to RNA-seq, unlike RNA-seq it would not be able to detect variants outside of the tagged region. And, because sequencing costs are continuing to fall, the cost differences could eventually be negligible. But, for now, Polyak said SAGE-seq is a good, reliable method for evaluating gene expression in both normal and disease tissue.
Have topics you'd like to see covered in In Sequence? Contact the editor at mheger [at] genomeweb [dot] com.