By Julia Karow
In a comparison of RNA-seq by single-molecule sequencing and amplification-based sequencing to study cancer samples, researchers at the University of Michigan have found that single-molecule sequencing is better at detecting low-abundance transcripts and shows less bias towards highly expressed genes.
The study "points out an important bias that amplification-based methods have," said Mark Rubin, a professor of oncology in pathology at Weil Cornell Medical College in New York, who was not involved in the project, and suggests there is a role for both approaches.
The project, published earlier this month in PLoS One, arose from the Michigan group's interest in studying gene fusions in human cancers. "We wanted to know how gene expression was modified as a consequence of using a single-molecule sequencing approach, and how that would affect our ability to detect chimeric transcripts," said Chris Maher, the senior author of the study and a research investigator at the Michigan Center for Translational Pathology.
At the time — in late 2009 — they were using the Illumina sequencing platform but were interested in evaluating the Helicos single-molecule sequencer, which they were able to use for this study through a collaboration with Helicos.
Though they only compared results from the Illumina Genome Analyzer II and the Helicos HeliScope, the researchers believe their findings likely apply more generally, to any type of amplification-based and single-molecule sequencing approach, according to Maher.
For their study, the scientists sequenced the transcriptomes of 12 prostate cancer samples with both platforms, including four different prostate cancer cell lines — two of them at different time points — and one matched tumor/normal pair.
For each sample, they generated between about 3 million and 20 million raw reads, and between 2 and 15 million filtered reads. They aligned the Illumina reads using the Bowtie aligner and the Helicos reads using IndexDP.
The Illumina platform covered highly expressed transcripts with more reads than the Helicos technology, whereas Helicos covered lowly expressed transcripts more effectively. "This additional coverage of high-concentration transcripts consistently appeared to be at the expense of lower-expressed transcripts, which tended to be more thoroughly sequenced” using single-molecule sequencing, the authors noted.
This difference could have consequences for downstream analyses, for example for quantifying gene expression or for identifying mutations, Maher said.
Also, a subset of genes at the low end of expression that the single-molecule approach detected could not be captured by the amplification-based approach at all. "This was a small subset, but it did, in fact, exist, and we went through a number of computational exercises to make sure it wasn't an artifact of our sequence analysis but in fact, truly genes that we could not actually document using amplification-based methods," Maher said.
He added, however, that generating more reads for each sample on an amplification-based platform might overcome the problem to some extent. "From equal amounts of sequencing, we project that you are going to miss some [transcripts] at the low end," he said. But with more sequencing, "some of the genes that we believe were missing on that low range will likely be detected." With single-molecule sequencing, on the other hand, "we already have better coverage at that lower end."
[ pagebreak ]
Eliminating duplicate reads also helps. When the researchers tossed out duplicates they identified by computational means, they started to see "very consistent expression coverage maps between single-molecule and amplification-based approaches," according to Maher. While they might have been over-zealous, eliminating some reads that were in fact not duplicates from the Illumina data, they expect that paired-end sequencing — which they now use for their gene fusion discovery work — can help them distinguish between duplicates and non-duplicates.
Even though their results indicate single-molecule sequencing is better at detecting lowly-expressed transcripts, the Michigan researchers continue to use the Illumina platform in their studies — partly because of a lack of funding for a Helicos sequencer. They currently have one Illumina HiSeq and one GAII in house. Helicos, which has been financially troubled, has also stopped selling its platform to new customers.
But their study made them aware of the bias of amplification-based methods toward highly expressed genes, and the need for filtering duplicates. Unfiltered, the results could point them to "biological processes that might not actually be relevant, say, to a cancer model, but might be the byproduct of the sequencing platform itself," Maher said. "You have to have filters in place that account for that, so you don't have these artificial biases introduced in your expression analysis."
Also, the University of Michigan is in the process of acquiring a Pacific Biosciences sequencer — another single-molecule sequencing platform — and Maher plans to evaluate that instrument as well for transcriptome sequencing. "Currently, we have very good throughput with our Illumina machine, but we also envision the throughput will continue to improve with single-molecule sequencing, and having that direct readout will be helpful."
Rubin agreed that in the near term, there will be a role for both amplification-based and single-molecule sequencing approaches. Amplification-based sequencing “has become the workhorse sequencing platform," he said. "However, SMS should be important initially for helping to understand intricate biological questions."
"The rapid pace of development in the sequencing technology field should ultimately make all these technologies more amenable to individual laboratories, and prices will compete with standard expression profiling chip platforms," he added.
Have topics you'd like to see covered in In Sequence? E-mail the editor at jkarow [at] genomeweb [.] com.