Two research teams have used Illumina’s Genome Analyzer to independently study alternative splicing events in the human transcriptome, while a third group has used customized microarrays in a similar study.
According to Chris Burge, an associate professor of biology and biological engineering at the Massachusetts Institute of Technology and the leader of one study, the data obtained from sequencing is at least as good as that from microarrays designed to detect alternative splicing.
“It’s quite clean, [and] you can get quantitative estimates of gene expression that are at least as quantitative as anything you can get with arrays, and you have this very fine resolution to distinguish between even very similar isoforms that differ only by a few bases, yet may change the reading frame of the protein and therefore may have a drastic effect,” he said.
He and his team used an older version of Illumina’s mRNA-Seq protocol to generate between 12 million and 29 million 32-base pair reads from cDNA fragments from 10 human tissues and five breast epithelial or breast cancer cell lines.
After mapping the reads to known and predicted exon-exon junctions, they found that more than 90 percent of human genes undergo alternative splicing. Most of this variation occurs between tissues, although it also exists between individuals to a lesser degree. The researchers published their results online in Nature on Sunday.
Burge said he first learned about the transcriptome-sequencing capabilities of the Illumina Genome Analyzer at a conference a year ago, and was wondering whether it could be used to study alternative splicing.
“I was a little concerned, at the beginning, that 32 bases might be too short, but we said we would give it a try,” he told In Sequence last week.
Most of the sequence data in the study was generated by Illumina on the first version of its Genome Analyzer, with a portion of the data — from several cerebellum samples — coming from the National Center for Genome Resources in New Mexico.
The researchers found that about three-fifths of the reads mapped uniquely to the genome, and about 20 percent mapped non-uniquely, mostly to paralogs.
In addition, about 4 percent of the reads mapped to splice junctions, “and these reads in particular are really useful for discriminating between different alternative splice forms,” according to Burge, whose lab has studied alternative splicing for about 10 years. “That basically means 32 bases is long enough,” Burge said.
Though he considered using the 454 platform, which provides longer reads that are “certainly good for making inferences about whole mRNA isoforms,” he said, “you get an order of magnitude more depth with Illumina short-read sequencing, and that more than makes up for the shorter length for this type of study, so we feel this is better at the moment.”
Burge acknowledged that a limitation of the short single Illumina reads is their inability to determine whether two splice events on the same gene are regulated independently or together.
“That, you could potentially see with longer reads, or with paired-end sequencing, so we are moving to paired-end sequencing for some applications on the Illumina platform,” he said.
“It’s quite clean, [and] you can get quantitative estimates of gene expression that are at least as quantitative as anything you can get with arrays.”
One initial challenge with the Illumina data was the fact that the read density varied across exons, even those exons that were not alternatively spliced. However, these sampling biases were very reproducible between tissues and samples, so “the relative gene expression values are highly accurate and correlate extremely well with qPCR data,” according to Burge, though absolute expression values correlated “somewhat less well.” New mRNA-Seq protocols introduced by Illumina, he added, have since improved this problem.
In the past, his lab has used microarrays, such as Affymetrix all-exon arrays, which are useful for studying 3’ untranslated regions and skipped exons. However, these arrays are not suitable for studying other types of splicing events, such as alternative 5’ and 3’ splice sites, where the alternative splice sites might be as close as a few bases apart.
“With almost any array technology you can imagine, there is not much space to put probes there to distinguish these two isoforms,” Burge said, “but with the sequencing, you can unambiguously distinguish those two isoforms if you have reads that map across the junctions.” Also, until very recently, Affymetrix arrays did not include probes across splice junctions, he said.
The microarrays used in another study — published online in Nature Genetics on Sunday by a group at Rosetta Inpharmatics, a unit of Merck — did contain probes to measure alternative 5’ and 3’ splicing events. But according to Burge, the researchers did not analyze those data, “so it remains to be seen how well that works.”
The Rosetta group, led by Jason Johnson and colleagues, designed a whole-transcript 17-array set custom-built by Agilent Technologies to monitor the expression of more than 24,000 alternative splicing events in 48 human samples. They found that more than 11,700 genes and 9,500 splicing events were differentially expressed.
But the sequencing approach used by Burge and his team was also able to discover approximately 1,400 new exons with high confidence, as well as thousands of potential new splice junctions, by mapping the reads against predicted exons and junctions. “One of the advantages of deep sequencing is that it’s a completely unbiased approach in that you don’t have to make any assumptions about what genes are expressed, or what exons or junctions might exist,” Burge said.
His results were, by and large, similar to those of a third study, by a team led by Benjamin Blencowe at the University of Toronto, which also used mRNA-Seq on the Illumina Genome Analyzer and published its results online on Sunday in Nature Genetics.
Those researchers generated between 17 million and 32 million 32-base pair reads for each of seven human tissues and used that data to search libraries of splice junction sequences that represent known as well as suspected splicing events. In their study, they reported that 95 percent of multi-exon human genes undergo alternative splicing, comparable to the 98 percent estimated by the Burge study.
In the future, Burge and his colleagues plan to study the regulation of splicing events further. One method they are using is to knock down, or knock out, a splice regulatory factor and to study by sequencing how the expression of alternative isoforms changes. They are also using CLIP-Seq, short for crosslinking and immunoprecipitation followed by high-throughput sequencing, a method invented by Bob Darnell’s lab at Rockefeller University, to study the same RNA-binding regulatory factors. “We think that is going to be a very powerful way to dissect the function of splicing factors because you learn where it binds to the transcriptome at great depth … and then, you also see from the knockdown which of those events are actually regulated and in which direction, and you start to make inferences about the function of those factors,” Burge said.
“Most of the experiments going on in my lab right now are using the Illumina sequencing — we have a lot of confidence in the technology.”