By Monica Heger
A combination of transcriptome sequencing techniques has revealed clues about the mechanisms of neural cell differentiation, according to a paper published last week in the Proceedings of the National Academy of Sciences.
In the paper, researchers from Stanford University describe how they sequenced the transcriptomes of human embryonic stem cells in various stages of their differentiation into neural cells, using both the Illumina Genome Analyzer and the 454 GS FLX with Titanium chemistry. They identified both known and previously unannotated transcripts as well as spliced isoforms specific to the differentiation steps. The researchers say their findings could eventually lead to a better understanding of neurodegenerative diseases, such as Parkinson's and Alzheimer's, and, ultimately, new therapies.
"If you want to know how a neural cell works, you need to know what proteins are being expressed," said senior author Michael Snyder, genetics chair at Stanford University and director of the recently founded Stanford Center for Genomics and Personalized Medicine. Sequencing the transcriptomes of neural stem cells throughout various stages of differentiation "will let us know all the steps involved, and then ultimately how to control it."
Snyder and his team sequenced undifferentiated human embryonic stem cells, stem cells at an early stage of differentiation, cells that only produce neurons upon further development, and cells that produce both neurons and glial cells.
The researchers used both single-end and paired-end sequencing strategies on the Illumina GA, as well as a long-read sequencing strategy on 454. They generated 140 million uniquely mapped 35-base single-end reads, 15 million uniquely mapped 35-base paired-end reads, and 1.5 million uniquely mapped 250- to 450-base reads from 454. The paired-end reads were from cDNA fragments of 300 base pairs, 300 to 600 base pairs, and 1,000 base pairs.
The single-end sequencing strategy allowed the team to sequence to a greater depth than the paired-end and long-read approaches because they could sequence much more at a lower cost, while the paired-end and long-read sequencing allowed them to sequence across exons and reconstruct the entire transcripts, said Snyder. He added that for future studies, he would probably not use the single-end sequencing strategy, but at the time, it was more cost efficient.
"The paired-end reads really let us get a lot more information about different transcripts from the same genes," Snyder said, "while the longer reads allowed us to link multiple exons together. But, you can't do lots of long reads because it's much more expensive."
Snyder said the most significant finding was a phenomenon called isoform specialization, where there is a high diversity of isoforms in the undifferentiated stem cells, and as they undergo differentiation, that diversity is lost. "Embryonic stem cells start out as a giant smorgasboard — they're loaded with lots of splice messages. But as they differentiate, only a subset of specialized messages seem to arise," he said.
Martin Hirst, a research associate at the BC Cancer Agency Genome Sciences Center in Vancouver, who last year used transcriptome sequencing to identify a novel mutation in a rare ovarian cancer (see In Sequence 6/16/2009), agreed that the isoform specialization was an interesting finding. "It's consistent with our notions of how the genome is being restructured" to focus on whatever cell subtype it is developing into, he said.
He also noted that the Stanford researchers identified previously unannotated transcriptionally active regions and isoforms, and also showed that those isoforms and transcriptionally active regions are specific to different stages of differentiation.
"This is one of the most comprehensive gene expression studies of this differentiation cascade," said Hirst.
He said that the combination of platforms allowed them to obtain such a detailed view of neural stem cell differentiation. "The long-read platform gives you the ability to stitch together longer exon-to-exon relationships. In other words, it gives you context," he said.
Hirst added that short-read technology has improved since the study was conducted. So instead of 35-base single and paired-end reads, scientists are routinely obtaining 75-base paired-end reads, and not sequencing single-end reads at all.
Hirst said that combining the longer reads of 454 with the shorter, paired-end Illumina reads is still a useful technique despite Illumina's increased read lengths, because the average mammalian transcribed gene is about 1.5 kilobases. So even if the read lengths are long enough to span an entire exon, it's still not clear how that exon is related to surrounding exons without longer reads that can span across multiple exons.
"If you could read [the entire gene] in a single read, that would allow you to fully annotate the isoforms, and that’s where we really need to go," he said. "And, I think this is a step in the right direction."