Participants in the second phase of the MicroArray Quality Control Consortium study are prepping for a face-to-face meeting next month in which they will review initial analysis results from several large microarray data sets with the goal of arriving at a consensus on the best methods to identify predictive biomarker signatures for use in clinical applications.
MAQC I, which addressed general issues of microarray platform reproducibility, published its results in Nature Biotechnology last September, and spurred a great deal of conversation and debate in the field regarding statistical methods for normalization, classification, and generating gene lists [BioInform 12-01-06].
Even though some of those issues are still under debate, MAQC is moving on, and the focus for Phase II has shifted from the use of arrays in basic research toward their use in clinical trials, safety testing, and other discovery applications that are likely to be the subject of regulatory oversight.
Federico Goodsaid, senior staff scientist in the genomics group at FDA’s Office of Clinical Pharmacology, told BioInform via e-mail this week that the primary informatics goal for MAQC II is “the study of algorithms applied to the generation of classifiers from microarray data. We would like to understand better the process by which algorithms are selected and classifiers are validated,” he said.
Even though issues associated with the reproducibility of microarray data “have some overlap” with those associated with generating predictive signatures, there are also many differences, Goodsaid said.
“In one sense, the former controversy is kind of dying down just because we’ve all focused our attention now on what is probably a more important problem for the scientific community at large, which is, ‘Can we build predictive models based on microarrays that will generalize out to field use and actually be reproducible?’” said Russ Wolfinger, director of scientific discovery and genomics at SAS, which will host the MAQC II meeting at its headquarters in Cary, NC, May 24-25.
“Some of the same concerns are coming up, but it’s going to be more in the context of reproducible predictions of some outcome of interest, rather than just the reproducibility of some gene list that you picked up from your current microarray study,” he said.
While MAQC II promises to be more complex than MAQC I, it is so far less contentious, Wolfinger said.
“In MAQC I we had some knock-down drag-outs,” he said. “People were getting angry and were getting their feelings hurt and everything, but we haven’t had any of that from what I can see so far in Phase II.”
One reason for this, he said, is that Phase II is on a much larger scale than Phase I. “In Phase I we just had the one data set. It was kind of boring. It was just two reference samples that were radically different, whereas now we’ve got real live data, real samples, and we’re signing non disclosure agreements to predict the confidentiality of the data, so we’re really kind of bringing the technology right to where it’s being used,” he said.
MAQC II has four primary working groups: Clinical, Toxicogenomics, Titration, and Regulatory Biostatistics.
The latter working group is crucial to the long-term goals of the effort, which aims to bring microarray analysis in line with the expectations of regulatory statisticians.
For example, the regulatory group has asked the data-analysis groups to submit so-called “statistical analysis plans,” Wolfinger said. “This is a step toward the way they do things officially in a regulatory framework. Any clinical trial data that’s going to be planned and submitted, the sponsor has to write up a nice big plan about what they intend to do and how they’re going to do it, and it’s all designed to keep everything on the up and up, so you don’t have people looking at their data after the fact and then changing things around so that it looks good,” he said.
“In one sense, the former controversy is kind of dying down just because we’ve all focused our attention now on what is probably a more important problem for the scientific community at large, which is, can we build predictive models based on microarrays that will generalize out to field use and actually be reproducible?”
“The Regulatory Biostatistics Working Group was started to capture information from MAQC2 activities that we may need to understand in future regulatory discussions,” said Goodsaid. “It is a unique opportunity to examine scientific and statistical activities of MAQC2 from a regulatory perspective.”
The FDA’s National Center for Toxicological Research initially created the MAQC in 2005 in an effort to explore issues surrounding performance, quality, and data analysis for microarrays.
Since then the FDA’s regulatory arm has taken several steps to modify its policies related to genomic data. Last fall, for example, the agency’s Center for Devices and Radiological Health issued a draft guidance encouraging molecular diagnostic companies to file for pre- and post-market approval for so-called in vitro diagnostic multivariate index assays. Earlier this year, the FDA cleared the first such test under the new guidance, Agendia’s MammaPrint, a microarray-based diagnostic for breast cancer recurrence.
It’s likely that the MAQC II results will have some impact on future regulatory decisions related to microarray-based tests and genomic signatures, but it’s still too early in the project to determine what that affect may be.
“There are several independent activities at the FDA connected in one way or another to genomics,” Goodsaid said. “MAQC efforts contribute to the knowledge we will need to have in the future for the reproducible generation and accurate interpretation of genomic data.”
For the time being, most MAQC II participants are “scrambling and working pretty hard this month to have some things ready for the meeting, which is coming up pretty quickly,” Wolfinger said.
At SAS, Wolfinger’s group is running the company’s JMP Genomics software on a “bunch of high-powered servers that we’re going to be running, at times, probably around the clock, just cranking through cross-validation type analyses, where we randomly hold out a set and then predict it and then repeat that whole thing 100 times, and then change the method.”
SAS is taking an “open ended” and “brute force” approach that tests various combinations of statistical methods to determine which ones might make the best predictions, Wolfinger said.
“We’re kind of looking forward to cranking through all of those and seeing which types of models and normalization methods seem to work well on the different data sets,” he said. “We don’t even know what to expect.”