By Monica Heger
Using an RNA sequencing technique that allowed them to identify transcriptional direction, researchers at the Mayo Clinic and Life Technologies have sequenced the transcriptomes of three oral cancer tumors and normal tissue on Applied Biosystems' SOLiD.
The technique, combined with whole-genome sequencing, allowed the researchers to identify allelic imbalances, and show that these are associated with copy number changes.
"The strategy of constructing and looking at the transcriptome was done in such a way so you could ascertain the direction of each of the transcripts," said David Smith, professor of laboratory medicine and pathology at the Mayo Clinic and senior author of a paper describing the method that was published in PLoS ONE last week. "We can see much more comprehensively the transcription in the normal and tumor samples."
For the transcriptome sequencing, the researchers deviated from most standard RNA-seq protocols by starting with total RNA instead of poly-A purified RNA — a decision that allowed them to analyze both coding and non-coding RNA.
Another difference between their protocol and others is that most RNA-seq studies begin with a reverse transcription step, said Francisco De La Vega, scientific fellow for computational genomics research at Life Technologies. "That typically biases the distribution of DNA to the 3' end because the enzyme tends to fall off" before transcription is complete. "We wanted a method that could give us a full representation of the complete transcript and have strand specificity," he said.
The scientists first fragmented the RNA, and then ligated double stranded DNA adapters to both the 3' and 5' ends of the RNA fragments, which gave them strand specificity because the adapters allowed them to see the direction of transcription. Then, when they did the reverse transcription, it was only on a small fragment of about 100 to 150 base pairs, so they were able to reverse-transcribe the entire fragment. They then amplified their library with emulsion PCR, and sequenced it using 50-base-pair fragment reads.
The researchers generated around 200 million reads per sample, but only 20 to 40 million of those aligned uniquely to the genome. De La Vega said this was because they began with total RNA instead of poly-A purified RNA. They tried to remove the ribosomal RNA before sequencing, because it is repeated, but were unable to remove all of it.
Joshua Levin, a research scientist at the Broad Institute's genome sequencing and analysis program who was not involved with the project, said the RNA was highly degraded, likely because it came from a tumor and not a cell line, which contributed to the researchers' inability to remove all the ribosomal RNA before sequencing and resulted in fewer mapped reads.
For the whole-genome sequencing, the researchers constructed mate-paired libraries with a 2.5-kilobase insert size and obtained read lengths of 25 base mate pairs and around 8-fold coverage of the genome.
De La Vega estimated that the experiment, which was done about 18 months ago, cost around $180,000 for reagents, but thought it would cost around $40,000 today.
The researchers were able to identify areas where one allele was preferentially expressed over the other. When they examined those regions across the genome, it often turned out that there was a copy number variation in the DNA that coded for the RNA. "So what you have is a whole new mechanism to reveal important genes that are involved in cancer," said Mayo's Smith.
De La Vega said the technique identified genes that had previously been found to be involved in oral cancer as well as a few novel genes that could potentially be cancer-related.
He added that the method is especially good for looking at allelic imbalance because it sequences the non-coding region of RNA as well as coding RNA. "Allelic imbalance is an alteration related to cancer but not necessarily correlated to gene expression. So if you are only looking at gene expression, you may not be capturing all cancer genes," he said. Often, allelic imbalance "doesn't produce a dramatic change in expression, but small changes in regulators can have large changes downstream," he said.
Smith said that the Mayo researchers are now expanding their study to examine 18 tumor/normal pairs of tongue and tonsil cancer. He said they are also doing transcriptome sequencing on the Illumina, but they haven't yet done a side-by-side comparison of the two machines.
He said Illumina has advantages in its library-preparation, which is easier and more automated, but that Life Technologies recently made some improvements to the SOLiD protocol that the Mayo researchers will be testing in the coming weeks. He also added that he thought the SOLiD yielded more data and revealed more of the complexity of the transcriptome, such as the direction of transcription, although there are directional RNA-seq protocols available for the Illumina GA.
The Broad Institute's Levin said that looking at allelic imbalance hasn't been done much before. "What's different [about their transcriptome sequencing study] is the way they looked at allelic imbalance, and were able to identify genes relative to cancer based on allelic imbalance," he said. "That may be a clue that people hadn't fully appreciated previously."
However, he said that the Mayo team's method did not report on other changes to the tumor such as RNA editing, fusion transcripts, and alternative splicing. He thought that because they did not use a paired-end sequencing approach, they were unable to detect those changes. But, he said, those changes had been characterized before, and the Mayo study is unique in that it looks at allelic imbalance.
Levin added that the study demonstrates that looking at allelic imbalance and copy number variation is a good approach for studying cancer and identifying biomarkers, and said the next step would be to do the study on a larger sample size, to really see if the researchers are able to identify cancer-related genes or mutations.
Thomas Stricker, a clinical fellow at the University of Chicago who is involved with the University's own cancer transcriptome sequencing project (see In Sequence 1/26/2010), said that the Mayo researchers' method, and particularly their analysis, were creative.
"RNA-seq allows you to ask questions that you haven't been able to ask before," Stricker said. "It's a recognition that these data sets are not just giving you expression levels. They're telling you a lot more than that." For example, instead of just showing that a certain gene is differentially expressed in cancer, this study demonstrates that you can show that one allele is expressed differentially, and you can then determine whether that is being caused by an amplification, deletion, or single-point mutation. "You can start to answer mechanistic questions about how the genes are regulated," he said, and how that regulation plays a role in cancer.