By Meredith W. Salisbury
Standards of measurement have always been imperative, from back in the day when a “foot” was reportedly defined by the distance from the king’s heel to his toe. Today, such standards have only increased in importance — and in the world of microarrays, it seems that everyone’s trying to bronze a new foot.
Microarray technology has existed for more than 15 years now, and debate continues as to the accuracy and reliability of the measurements emerging from it. Why is it that two researchers in different labs using exactly the same technology platform and the same samples can get significantly different results? Why is it so hard to compare data across array platforms? It seems that every few months someone publishes a paper highlighting the vagaries of this technology — papers that are “either sobering or scary,” says Pat Hurban, the director of investigational genomics who heads up the array lab at Icoria.
In earlier days, these challenges represented a lovably eccentric and young technology, and it was largely an academic matter to fine-tune protocols in such a way as to have the highest confidence possible in an array experiment. Now, as chips stand on the brink of becoming a critical diagnostic tool in the clinic, the technology no longer has that luxury. Activity in the field has fired up recently, with a number of initiatives — including both public-sector and private-sector participants — launched to navigate the tricky terrain of microarray standardization. “I think what you’re seeing is a maturation of the technology and an understanding that it’s really closer to clinical application — significantly closer than it was even a couple of years ago,” says Marc Salit, a program analyst at the National Institute of Standards and Technology.
Part of what’s making standardization so difficult is the number of variables involved in using them, says Tim Sendera, who heads up the genomics and bioinformatics group for GE Healthcare’s CodeLink array line. He points to user variability and differences in computational techniques and data interpretation as some of the major forces acting against an easy solution for array comparability.
Several government agencies have stepped in to help facilitate groups working on these problems, including FDA, NIST, and the National Institute of Environmental Health Sciences. The major array manufacturers — including Affymetrix, Agilent, Applied Biosystems, and GE Healthcare — are active participants in many of them. “It’s in all of our interests to show that in fact microarrays are reproducible and they do behave well,” says John Burrill, a senior application scientist at Applied Biosystems.
One of those initiatives, the Microarray Quality Control project (known as the MAQC), was set to begin the last of its wetlab experiments last month, Sendera says. Meanwhile, some of the other standardization efforts are just getting underway. While the results of all of these initiatives remain to be seen, one thing is certain: the next several months will be a particularly interesting and defining time for microarray technology and the scientists who rely on it.
Array of Problems
There’s no shortage of hurdles facing scientists trying to come up with array standards. One overarching issue is in managing expectations — finding the balance between making the tool as precise and reproducible as possible without losing sight of the reality that arrays will never be as simple as other tools. “You have to remember that it’s a microarray,” says Icoria’s Hurban. “It’s not RT-PCR, and you shouldn’t hold it to exactly the same standard.”
That said, no one’s abandoning the quest for some semblance of a standard. Chip vendors are eager to do what they can — indeed, most manufacturers include at least basic control measures on their chips now — but there’s a recognition throughout the field that no single supplier’s solution is going to be the answer. “This needs to be independent of platform so that we can ensure reliability of this data from different sources,” Sendera at GE Healthcare says.
There are two obvious places on which standardization efforts can focus: the arrays themselves, and the informatics involved in data analysis on the back end. Researchers are spending significant brain power on both sides of the equation.
As far as chips go, the biggest questions revolve around what’s actually spotted down on the substrate. “If the RNA isn’t exactly the same or is degraded a bit, that tends to lead to increased variability,” says Burrill at ABI. Identifying a metric to evaluate RNA quality, therefore, is a priority, according to Karol Thompson, molecular toxicology team leader in the division of applied pharmacology research at FDA’s CDER branch. That kind of measurement would help “determine overall performance level” of an array experiment, she adds.
Differences between manufacturers in probes add to the confusion, of course. Most manufacturers release at least some basic sequence information about the probes they put on the chips, but there’s still some ground to cover to make this part of the problem more transparent, many people argue. “At the microarray companies, they’re really working in good spirit to make sure their customers are successful,” says Marc Salit, program analyst at NIST and team leader of the institute’s Metrology for Gene Expression initiative. “But one of the things that has to happen is we’re going to have to play more and more with open hands. The probe content has to be available.”
Manufacturers are beginning to respond to that need. “Affymetrix has released some data sets that show the different concentrations of RNA, how sensitive the chips are at identifying gene expression at different concentrations,” says Michael O’Connell, life sciences director at Insightful. “There’s really not enough of those types of data sets.”
But if there were, that alone could go a long way to settling inter-platform differences. Scientists aren’t insisting that content be the same across all chip providers, Salit says; rather, “what’s important is that people be able to compare results and understand why they’re different.”
All of the manufacturers GT spoke with indicated that they’re trying to provide as much annotation as possible to help customers have a better handle on which results can be fairly compared. Still, there’s some disconnect among vendors on which sources to use for probe content. Erik Bjeldanes, product manager for gene expression at Agilent, says his company “is now using public data sources” to guarantee that all sequences have been peer-reviewed and can be openly accessed by anyone using the chip. Burrill notes that ABI uses some data from its proprietary Celera database in addition to information from leading public-sector sources.
No matter how well you know what’s spotted on the chip, there are always ways to cause confusion during the experiment itself: hybridization energetics and chemistry, to name a couple. One idea that researchers favor is a simple indicator on the array itself that would help users know whether an experiment worked or not. This would “help people understand the quality of their experiment based on the performance of [certain] controls,” Salit at NIST says. A red light could show it didn’t work and a green or yellow light could show it did, for instance. “It’s that dashboard idiot light, if you will,” Salit says.
Another step that could help is reducing the level of user variability. Today, one of the reasons array experiments are something of an art form is how much human involvement is needed to run them. Sendera at GE Healthcare says his team has “initiatives to fully automate the process from start to end” — an improvement that could greatly bolster the reproducibility of the technology.
All of those elements relate only to the physical array. There are just as many problems on the data side with myriad software options and plenty of difference in ideas of how data normalization across platforms should work. Data preparation techniques are an essential part of making experiments mesh properly, says Burrill. Much of that relates to “which [data points] they choose to use, which ones they leave out because they didn’t perform well. At the end of the day when they talk about reproducibility — ‘I got gene list A, I got gene list B, are they the same?’ — all the steps leading up to a gene list, if they’re not exactly the same” then you’ve got to expect to get different lists, Burrill says.
Through an experiment in which Icoria participated with the NIEHS-funded Toxicogenomics Research Consortium, Hurban says, they learned that simply setting the same parameters in the software extracting data from a microarray image can go the length toward raising confidence levels. “There was a significant increase in correlation of data” across the centers participating in the project when all users set their various software tools to the same parameters, and an even higher jump in reproducibility when in a later test all the data was re-extracted by the same team of people at Icoria using the same software and parameters. “It’s not the software package per se,” Hurban says. “It really has more to do with how you set the parameters” governing traits such as acceptance of outliers.
Another potentially tricky step is in normalizing data once it’s been collected from various sources. Whether results are generated from different platforms, different labs, or even just different experiments, “if you want to compare data from one experiment to another, you have to normalize them to a common scale,” says O’Connell at Insightful. “You have to have enough careful normalization of the data so that you’ve removed the extraneous variability to bring those data to a level footing where you can make comparisons and combine those data.”
The data challenge is a major motivator for settling on standards, according to Thompson at FDA. “People are realizing that you’re going to have to look across laboratories,” she says, and they need to feel a certain amount of trust for the array data being generated and entered in public repositories. “It’s really coming to a critical mass as more data is generated and there’s more value added to comparing more data.”
Tackling the Problem
That critical mass has led to a number of standardization initiatives — mostly large consortia gathering both public- and private-sector scientists. Several run under the auspices of NIH: NIST plays host to the External RNA Controls Consortium, and has recently begun the Gene Expression Metrology Consortium; NIEHS, meanwhile, funds the Toxicogenomics Research Consortium. FDA plays an active role through the MAQC, or Microarray Quality Control project, run by Leming Shi. Having government agencies set up these consortia gives participants (especially vendors) a neutral place to meet and find common ground. While conventional wisdom says manufacturers would have no interest in making their products compatible with or comparable to competitors’ products, array vendors have by all accounts embraced the standardization efforts. “It’s very much a team environment,” says Burrill at ABI.
With so many such efforts in place, it seems that the field is destined for standardization — but the downside, of course, is what happens if these groups come up with competing standards?
“If you can just get everyone to agree on one standard, that’s great,” says Insightful’s O’Connell. “But if it gets fractured and fragmented, you’re in exactly the same position that we are now. I don’t know how that’s going to play out.”
Most consortia participants are optimistic that there’s no standards battle brewing. So many of the same people belong to the various groups, says Salit at NIST, that it “enforces a separation of goals.”
Indeed, each consortium has its own take on the standardization challenge for microarrays. The External RNA Controls Consortium, for instance, aims to develop “a reference set of RNA molecules and protocols for using them as spike-in controls,” Salit says, adding that the consortium is also working on basic informatics and analysis tools to go along with the reference set.
The MAQC, on the other hand, has a primary goal of developing “a set of guidelines for determining the acceptable performance on a microarray,” says GE’s Sendera. That makes sense for an FDA-sponsored group, notes Pat Hurban at Icoria. The agency wouldn’t try to dictate a certain platform or protocol, he says, so what FDA wants most to figure out is “to what extent they can put performance standards” around these experiments.
The MAQC is moving rapidly and results could come out in a matter of months. “We completed the pilot study” a few months ago, says Sendera. The last of the experiments was slated to begin in August. “We’re anticipating several weeks for running the microarrays — each vendor is running them across multiple sites,” he adds. “We need to collect and interpret the data, and then compare between platforms.”
Meantime, the Toxicogenomics Research Consortium, a group of seven research centers using arrays, focused on studying differences across a dozen or so array platforms —including the homebrew variety — to figure out where the most variability occurs and where the chances are best for improving reproducibility. “When we started the consortium in 2001, we were being responsive to a stated need by researchers who use microarrays to study disease processes. Researchers want standards in order to minimize variation,” says Brenda Weis, who coordinates the consortium.
Pat Hurban, who oversaw Icoria’s involvement with the Toxicogenomics Research Consortium, says his colleagues were pleased with the results they saw. “When we analyzed all of the work … we found that there’s actually a pretty nice degree of comparability across all of those platforms.”
The Gene Expression Metrology Consortium is a new initiative spearheaded by Salit at NIST. Salit, who has spent his career helping firm up measurement standards for different industries, says NIST “is really hoping to be able to contribute to the biosciences in the 21st century as it did to the physical sciences in the 20th. We’re at the beginning, if you will, of quantitative bioscience.”
In Salit’s mind, standardization is just the beginning for turning microarrays into a mature, reliable technology. “It’s the right thing to be focusing on right now,” he says, “but even if we had the right standards today I’m not certain we’d have microarray measurements of known quality.” What he envisions down the road is bringing the technology to a point where users could provide an accurate margin of uncertainty with their data — ‘this result, +/- 0.6,’ for instance. That’s “the holy grail,” Salit says. “That is where a mature technology lives.”
STANDARD STEPS, HERE AND NOW
Standards may be months or years away for the average array user. In the meantime, though, experts say there are several steps you can take right now to improve the reliability of your microarray experiments. Their advice:
Closely monitor RNA quality. Manufacturers have different ways of helping you determine this, and knowing how sound the RNA is goes a long way toward a more reproducible experiment. Even slightly degraded RNA on a chip can significantly reduce accuracy.
Follow standard protocols. “We put a lot of effort into designing our protocols and optimizing them to work with our system,” says John Burrill at Applied Biosystems. You risk losing a basis for comparability “if you make a change in the protocol and someone else doesn’t,” he adds.
Include replicates in your experiment. “Take care on the experimental design to make sure [you] have enough replicates,” says Michael O’Connell, director of life sciences at Insightful. Enough replicates for each condition will give you “a better handle on the signal,” he adds.
Adhere to MIAME. Compliance with the MIAME, or minimum information about a microarray experiment, standards “is becoming more and more critical as part of the peer-review process,” says Erik Bjeldanes at Agilent. “Some of the top journals are already mandating MIAME compliance.”
Check vendor controls. Many array manufacturers already provide some sort of indication to the user of how the experiment has run. “We provide a suite of controls to tell you how the individual steps are performing,” ABI’s Burrill says. “There are some blank controls, some spike-in controls. If they’re within the parameters then you know your microarray worked.”
Run a known sample. Use a sample that you already understand and for which you have good data as a benchmark. Scientists “should really know they can get a certain result with a standard sample,” says FDA’s Karol Thompson.
Be careful setting parameters for data extraction. Pat Hurban of Icoria says when his team participated in a multi-platform array study, one factor that distinctly stood out was how noticeably data comparability rose when the parameters of any of the analysis programs were set for the same levels. “The data came out significantly more consistent,” he says.
Try a cross-platform analysis package. Xianghong Zhou, an assistant professor in the molecular and computational biology department at the University of Southern California, says his team has come up with a “software package developed to integrate and analyze microarray data from different platforms.” Zhou, who presented the tool at this year’s ISMB conference, says it is free for academic users and can be found at http://zhoulab.usc.edu/ArrayAnalyzer.htm.