The second phase of the Microarray Quality Control project took a step forward in recent weeks as the government-hosted group received its first data sets from donors.
The MAQC evaluators hope to use the data to better assess classifiers that are routinely used in biomarker identification in order to show that molecular signatures are reproducible.
“Phase I showed that different platforms can generate the same biologically equivalent data,” Yu Ling, vice president of gene expression at Panomics, which sells both arrays at RT-PCR assays, told BioArray News this month.
“That was done with very simple samples and people want to know whether this can be applied across platforms in real-life samples and relevant in clinical diagnostics and toxicogenomics.”
“Phase II should show that even with real life samples, you can show how different platforms that use the same set of genes can generate similar results,” he said. Panomics is participating in the second phase of the MAQC project.
Still, while the data is now reportedly “pouring in,” the project faces the challenge of determining how exactly it will evaluate different algorithms and metrics as it looks to create a second suite of publications, similar to the results of the first phase of the project, published in Nature Biotechnology
in September (see BAN 9/12/2006
Much of the early-stage progress in phase II occurred at a meeting last month in Washington, DC, where members for the first time volunteered data sets that will be essential to the completion of the project.
“The result of the meeting is that we have been able to get the help of dozens of different companies in terms of both data sets as well as classifiers,” said Federico Goodsaid, an MAQC project leader and a scientist in the Genomics Group at the US Food and Drug Administration’s Center for Drug Evaluation and Research.
“We are going to have data shared both from clinical as well as toxicogenomic datasets and it also has a very intensive participation from many different government, academic, and industry labs,” he told BioArray News this month. Goodsaid added that the agreement put to rest the project’s concern that it would not be able to get access to enough quality data to complete phase II.
“Just like when it came to doing this with [phase I of the] MAQC [project], you always wonder whether there is an issue with sharing data or different types of efforts, but people come together because they feel that something needs to be fixed,” Goodsaid said.
“In this case it has to do with [researchers] generating those signatures in the end. What is it about the way those signatures are generated that we need to worry about? This will be a chance to examine that [question] at length,” he added.
Three Working Groups
Prior to the November meeting, the MAQC project members were skittish enough about the potential lack of data that a small working group was created entirely devoted to fishing through the large corpus of data from the first phase of the project to find suitable data sets for use in the second phase. Additionally, two larger groups were formed, one to analyze clinical diagnostics data and one to analyze toxicogenomic data.
Roderick Jensen, director of the Biotechnology Center at the University of Massachusetts in Boston and a MAQC project leader, told BioArray News last week that the small working group was formed because of the uncertainty of the amount and quality of data the group would be working with.
Now, however, “data is just pouring in” with an emphasis on array data generated in cancer research, Jensen said. “There’s a study out of Little Rock, Ark.; 150 DVDs of Affymetrix .cel files. That’s got to be at least 400 GeneChip arrays worth of data.”
Jensen said that Leming Shi’s lab at the National Center for Toxicological Research, which formally hosts MAQC, is busy uploading the data to its website
, and that it should be freely available to MAQC members by late January 2007.
Attacking the Data
The large-scale availability of data has been a blessing for the MAQC project, which includes participants from academia, the government, and industry. At the same time, as the phase II data repository grows, MAQC project members acknowledge that their plans for analyzing the data in order to fulfill the goals of the second phase remain sketchy at best.
According to Damir Herman, a MAQC project leader and scientist at the National Center of Biotechnology Information, one way the group has sought to address the problem of sifting through data has been to prioritize certain data sets that will then be assigned to the project’s working groups.
“We have a few nominated data sets for the next phase, but it’s going to be very difficult to prioritize data sets for each group,” Herman told BioArray News this month. “We’ll decide by the end of the year which data sets will become priorities for which groups.”
Herman said that the MAQC’s ultimate goal will be to “propose metrics that would be unambiguous and useful in clinical studies, particularly in cancer research.” Still, the project has yet to decide how exactly it will produce those metrics.
One suggestion from Herman is to use widely established algorithms to analyze the data sets and to work from whatever pattern emerges in order to identify, and later evaluate, useful metrics. But, according to Herman, that will take a lot of time and effort and it is not clear that it is the best path forward.
“A traditional means of attack to the problem, like heavy statistics, is not going to work.”
“The short and honest answer is that we don’t know” how MAQC is going to meet the goals of phase II, Herman said. “A traditional means of attack to the problem, like heavy statistics, is not going to work,” he explained. “We would like to apply every possible algorithm to each dataset. Then, the pattern should reveal itself eventually.”
Despite the experimental nature of phase II, biotech companies are still eager to participate. Jensen said that while the first phase of the project drew heavily on the involvement of microarray vendors, which helped create data that was shown to verify concordance between different platforms, the second phase would likely attract greater interest from bioinformatics companies because of its emphasis on statistical analysis.
In September, Weida Tong, director of the FDA's Center for Toxicoinformatics at the National Toxicological Research Center and a MAQC organizer, told BioArray News that pharmaceutical companies were also likely to play a more prominent role (see BAN 9/19/2006).
“We see a lot of the voluntary genomics data submissions from pharmaceutical companies where they develop a molecular signature and try to predict treatment outcomes” based on that signature, Tong said at the time. “These types of molecular signatures are quite frequently used in the drug-development stage. So there are issues with FDA on how to evaluate signatures developed by pharmaceutical companies,” he added.
“We need to have a strong presence from pharmaceutical companies as we begin to decide how this project will work,” Tong said.
Despite the shift in focus to pharma and bioinformatics companies, array users, and companies that sell other research tools such as RT-PCR, are likely to benefit from MAQC phase II should it be successful.
Panomics’ Ling said that the completion of phase II could benefit businesses like his own because it will be able to show equivalency between rival platforms and its own to potential customers. “This way if an array platform comes up with a classifier, we can use their classifier to demonstrate the kind of data we can produce on our platform,” he said.