The Northwest microarray conference, held in Seattle earlier this month, was like a mug of hot black coffee–deep brewed science, but without the creamy PowerPoint product pitches. Overall, the talks focused on how scientists in academic and private labs are taking microarrays to the next step by optimizing reagents, sample prep, and experiment design, and are going beyond simple data clustering to integrate microarray with biological and clinical data.
“Now we are starting to enter an era where array technology is not the end all and be all, where we are starting to couple array technology with all sorts of global measures to see if we can’t get biological meaning out of it,” said Roger Bumgarner, the University of Washington researcher who organized the conference, which drew about 100 participants and was held at the city’s shiny new Benaroya Music Hall. Following is a summary of some key presentations.
On the first day, Thomas Gingeras from Affymetrix shed some light on the company’s work with the ‘dark matter’ of the genome. The goal of this project, in which Gingeras and other scientists at Affymetrix and NCI are using special arrays that contain 25-mer probes spaced, on average, every 35 base pairs along the chromosomes in the genome, has been “basically to develop an empirical map of the trancriptionally active regions of the human genome, then compare these to the [sequenced] human genome map,” Gingeras explained. Rather than using the oligos on the company’s commercially available gene chips, the scientists applied “an ab initio approach, where we walked our way through the genome” looking for transcripts. In the initial published work on chromosomes 21 and 22, they used 11 different cell lines, and found that the RNA profile of a cell line varied depending on how they fractionated the sample.
“At the 60 thousand foot level, there was a good correlation between the exon density and probe density on a global level,” Gingeras said. But when they looked much closer, “transcription occurred in many unannotated regions.” What is interesting is that many of these intragenic transcripts corresponded to predicted genes from gene prediction programs and conserved regions from other animals. “I keep an open mind as to what a gene is,” said Gingeras. “It’s not just [a sequence] that can be translated into a protein.” Most of the RNA they are finding “is not coding, but more regulatory and structural in nature.”
Gingeras said the company next plans to launch a collaboration with the software company BioTique, the product of which is an online search tool that allows users to query the transcriptome data that the company has published so far. Within the next year, Affymetrix and the NCI plan to publish a map containing transcriptome data from 25 percent of the genome, including chromosomes 1, 7, 14, 19, and others. They expect the entire project to take three years.
If hot Starbucks java and cold fruit salad served for breakfast was not enough to wake everybody up for the second day of the conference, a rapid-fire presentation by TIGR’s John Quackenbush surely was. Quackenbush opened up his talk with a slide showing a picture of President Bush next to that of a weasel, with a diagram of a hypothetical microarray experiment comparing RNA from both organisms. Quackenbush went on to quip that that he is employing a philosophy of “compassionate fascism” to enforce the MIAME standard, the microarray gene expression database working group’s standard for microarray data submission: he will try to force everyone to adopt the data standard if they know what’s good for them.
Seriously though, Quackenbush seemed to admit that his lab has surrendered to the reality that the microarray community has adopted Affymetrix chips as a de facto standard: In surveying a number of different studies on microarrays and cancer, Quackenbush only looked at studies that used Affymetrix chips because they could be combined, at least within chip type.
This survey was more than a meta-analysis. Quackenbush’s group sought to combine a massive amount of data on a number of different tumors in order to derive tumor classification algorithms. “We wanted to be pretty comprehensive,” Quackenbush said. The group employed a supervised neural network approach and was able to train the network to classify many of the major tumors with 90 to 95 percent accuracy.
This type of data superset is what is needed in order to make microarray data clinically relevant, according to Quackenbush. At a recent meeting on microarrys at NCI where they were deciding whether or not the tools exist [for microarrays] to have clinical impact, he said that every speaker had the same message: “There are a lot of tools out there, people can tweak them and make them work: what we need now are really big data sets.” This study is now under review at a major scientific publication, Quackenbush said.
Solving the Translation Problem
David Morris of the University of Washington department of microbiology discussed in his presentation how his lab has used DNA microarrays to look at mRNA in ribosomes. This process, which the team calls Translation State Array Analysis, involves fractionation of polysomes by sucrose gradient centrifugation, then hybridization of the resulting mixture to the microarrays. During a process the group terms “low resolution TSAA,” the RNA sample is partitioned into ones poorly loaded and well loaded with ribosomes. Then the different fractions are hybridized to two arrays, and the levels of various RNA transcripts are compared to derive “a ratio of well-translated to under-translated messages,” Morris said.
While the low-resolution TSAA is “simple and straightforward,” with one gradient per array, and is computationally simple, involving one Excel spreadsheet, it does not pick up more subtle differences in RNA-ribosome interactions, Morris said. The group has also started moving to a more high-resolution system, which involves 25 arrays per gradient, but this has the downside of being more expensive than low-resolution analysis. Still, the overall benefit of TSAA is that “you get a pretty good idea of what’s going on in translating mRNA to protein.”
In a similar vein, Scott Tenenbaum of Duke’s Center for RNA Biology described how he and his colleague Jack Keene have developed a system that uses DNA microarrays to profile mRNA binding proteins. This system, which they call En Masse Nuclear Run-on analysis, involves isolating endogenously formed mRNP complexes using immunoprecipitation, then hybridizing them to spotted cDNA microarrays and looking for similar cis-elements among clustered mRNAs. “The point is to capture the mRNA binding protein that is interacting with a message in a cell,” Tenenbaum explained. This process, which the team recently published in Molecular Cell, (2002, 9, 1161) has turned up “unique subpopulations” of mRNAs associated with different binding proteins, the composition of which varies depending on cell conditions. Also, a single mRNA can be found in multiple complexes – adding further weight to the view that transcription is a complex process involving more than just a single message from a single gene to a single protein.
On the technique side, Maria Tretiakova, a pathologist visiting the University of Chicago from St. Petersburg, Russia, addressed the question of whether laser capture microdissection (LCM) biases microarray results. Tretiakova and her colleague Chris Dyanov compared three samples of fresh kidney tissue, one that was randomly microdissected using the state-of-the art Leica non-contact UV laser microdissection system, and two that were prepared through standard methods. The microdissected sample and one of the standard samples, each of which totaled 100 ng, underwent two rounds of standard linear amplification. The third sample, which was 5 µg, served as a control for the effects of amplification. All three were then hybridized to Affymetrix U95A microarrays. Tretiakova and Dyonov found that there was an 89 percent overlap in gene expression levels from the LCM and non-LCM samples, concluding that LCM did not bias the results of the hybridization. Some audience members disputed the finding, suggesting that the difference in gene expression levels was significant. But most agreed that LCM is preferable to other methods for sample procurement.
In a related talk, Jiangning Li from the UW department of microbiology discussed the limits of T7 linear amplification. “We recommend that the minimum amount of mRNA and tRNA for one-round amplification is no less than 50ng, and 100ng,” Li said. After evaluating Hela and HelaE4 cell RNA amplified using the Eberwine protocol – the classic publicly available linear amplification procedure – and then evaluating the resulting RNA with the Agilent Bioanalyzer, Li and colleagues found that a starting amount of less than 50ng resulted in short RNA strands of less than 200 bases.