This article was originally published July 6.
By Julia Karow
Digital gene expression analysis on the Helicos Genetic Analysis system complements full-length RNA sequencing methods and promises to be more accurate and less expensive than similar methods on other sequencing platforms because the sample prep is simple and avoids amplification, according to the company.
On Sunday, Helicos researchers published a study in Nature Biotechnology in which they described the method, called single-molecule sequencing digital gene expression, or smsDGE, and used it to quantify gene expression in baker's yeast.
The paper, Helicos' second peer-reviewed article since its Science publication last year that described the basics of its technology (see In Sequence 4/8/2008), focuses on one of several applications the company has been developing for its single-molecule sequencing platform, including genome sequencing, RNA-seq, and copy number variation analysis.
smsDGE generates a single read from the 3' end of a first-strand cDNA molecule, usually representing the 5' end of the corresponding mRNA. It provides strand-specific and accurate transcript counts, according to the article, across a dynamic range of four orders of magnitude. Sample preparation involves no amplification, ligation, or restriction digestion.
The method is "specifically designed for accurate quantification," according to Tal Raz, a senior scientist at Helicos and one of the lead authors on the paper, and complements RNA-seq, another application available for the Helicos platform.
While smsDGE generates a single short read for each transcript, RNA-seq covers the entire length of the transcript and requires more reads as well as prior knowledge of the transcript length for accurate quantification, she said.
The reason Helicos chose yeast for its study is that the yeast transcriptome is much better characterized than the human transcriptome, both by array-based methods and by RNA-seq on other high-throughput sequencing platforms. Yet, the Helicos researchers were able to discover novel transcription initiation sites as well as novel genes "that had not been seen by previous technologies," according to Patrice Milos, Helicos' vice president and chief scientific officer.
One of the main advantage of smsDGE — or other sequencing-based gene expression methods, for that matter — over microarrays is that they allow researchers to quantify transcripts one sample at a time, with no need for ratiometric comparisons of sample pairs. The digital nature of the data "allows you to compare a dataset you generated one day to [another one produced] six months later," according to Milos. Also, researchers can discover novel transcripts that may not be represented on an array.
On the cost side, smsDGE is on par with microarrays, according to Helicos. A microarray experiment still costs more than $400, said Helicos President Steve Lombardi, and given that several chips are often needed for each experiment, "you are talking north of $1,000."
Sequencing reagents per channel of the Helicos instrument — which generated approximately 12 million reads in the present study — cost between $250 and $350, according to Lombardi, and there is "virtually no additional cost upfront" for sample preparation.
Sequencing costs are about three to four times higher for generating the same number of reads on the Illumina Genome Analyzer, according to Sorana Morrissy, a researcher at the Genome Sciences Centre at the British Columbia Cancer Agency. She and her colleagues recently published a paper on a tag sequencing method for the Illumina platform called Tag-seq (see In Sequence 6/30/2009). Library prep costs are also likely higher on the GA, she said.
[ pagebreak ]
However, since Tag-seq uses adaptors, scientists can barcode individual samples and pool up to 96 samples in a run, which "significantly reduces the cost of Illumina sequencing," she said.
Both smsDGE and Tag-seq are strand-specific, allowing researchers to analyze sense and antisense transcripts, "which has thus far not been possible using other next-generation methods such as RNA-seq," she said.
And while smsDGE can provide some information on transcriptional start sites, Tag-seq generates data on alternative splicing events, she pointed out.
The fact that the Helicos method does not involve RNA amplification should enable scientists to analyze samples with limited RNA content and avoid biases introduced by other methods during the amplification step, according to Morrissy.
That expectation is shared by others. With the Helicos approach, "there is no amplification so you are as close to the native nucleic acid from the cell as you can get without actually sequencing the actual mRNA," said Chad Nusbaum, co-director of the genome sequencing and analysis program at the Broad Institute, which has a Helicos Genetic Analysis system installed.
"This should greatly limit the biases that are brought in by any amplification, ligation, or other enzymatic step … so that you theoretically get a more accurate and less noisy depiction of what's going on in the cell than other methods."
Since they generated the data for the current study, Helicos and several of its collaborators and customers have used smsDGE to analyze species other than yeast. "It works well in human, mouse, and even in plant species," Raz said.
For example, Helicos researchers, in collaboration with scientists from the Children's Oncology Group, a pediatric cancer research cooperative, have used the method to study a variety of tumor samples from patients, according to Milos.
Nusbaum said that the Broad Institute is exploring smsDGE to study gene expression in "a variety of samples," including some with limited input material.
The hope, he said, is to achieve "greater accuracy in representation and, hopefully, lower cost for this counting application," compared to other methods.
For their study, the Helicos scientists sequenced cDNA from S. cerevisiae, generating 240 million reads from six channels of the 50-channel flow cell. Reads had a median length of 33 bases, and the average error rate per base was 4.4 percent to 4.8 percent across the six channels.
After filtering for read length and sequence complexity, they were left with 143 million reads, ranging in length from 24 to 60 bases, which they aligned to the yeast reference genome and to a transcriptome reference library.
They were able to map 86 million reads stringently to the genome, and 78 million to at least one transcript. They then assigned each read to a transcript, generating counts that are reported as transcripts per million.
Overall, they measured 6,086 transcripts of the 6,711 putative yeast open reading frames at an abundance of between 1 and 16,000 t.p.m., and 5,376 at more than 10 t.p.m.
The results showed "high agreement with a transcript level profile previously measured for 5,460 genes using oligonucleotide arrays," according to the article, and counts spanned at least four orders of magnitude "with higher resolution of low abundance transcripts … than was demonstrated in the microarray study."
They also had "high agreement to published transcript counts" for a different yeast strain that another research group analyzed previously using RNA-seq on the Illumina GA.
In addition, the Helicos counts agreed with most of 33 qPCR measurements of the same mRNA sample.
One way the Helicos researchers are thinking of further developing their method is by using paired reads, which would allow them to quantify specific isoforms of the same transcript.
They are also working on using smaller amounts of input material, with the next goal of starting with 50 to 100 cells, according to Milos.
"What we need to do is figure out how to simplify the sample prep even more in order to make that happen, and we have some interesting collaborations in that space that we have not announced yet" that would allow the company "to look at things like stem cells," according to Lombardi.