As a first step in familiarizing itself with the nuances of microarray data, the FDA’s Office of Testing and Research has embarked upon two separate gene expression database projects. One, in collaboration with Iconix, will introduce FDA reviewers to the basics of microarray data via the company’s DrugMatrix toxicogenomics database. A second project, with Schering-Plough and Affymetrix services provider Expression Analysis, will create an internal “mock submission” database for gene expression data.
The outcome of the two database projects will shape a draft guidance document the FDA is preparing on the submission of microarray data.
Getting Past the Fear Factor
The FDA has had access to the DrugMatrix database since March, when it began a collaboration with Iconix Pharmaceuticals to gain hands-on experience with toxicogenomics data and tools. The agency is boning up on the database as part of an effort to correlate the content and format of gene expression microarray data with standard toxicology and pharmacology study results. Iconix is training FDA reviewers on quality control and quality assurance for microarray data generation, as well as the analysis of data across multiple microarray product platforms, and the validation of biomarkers from integrated chemogenomic datasets.
The database contains findings from approximately 600 compounds, across multiple doses and multiple times. Gene expression data is linked to information on pharmacology, histopathology, clinical chemistry, and toxicology related to those compounds, to provide a “contextual reference set” for FDA reviewers to compare new findings with known results, said Kurt Jarnagin, vice president of biological sciences and chemical genomics at Iconix.
Bringing it in House
The goals of the planned internal gene expression database are a bit different than FDA’s project with Iconix. In this effort, FDA, Expression Analysis, and Schering-Plough will build a framework to support the “mock submission” of data from a drug project Schering opted to discontinue. “We’re taking that data – which includes microarray data, histology data, clinical chemistry data, and phenotype data – and helping FDA to understand the appropriate format, content, and context of microarray-based submissions,” said Steve McPhail, CEO of Expression Analysis.
Pilot submissions to the database is expected to begin in June, and the project is scheduled for completion in October, McPhail said. A final summary report on the project is planned for November.
The project will address a laundry list of issues, including laboratory infrastructure, sample processing and array QC/QA issues, and experimental design and replication, but informatics-related questions make up the majority of topics. Data management issues such as format and file structures, linkage mechanisms between microarray data and other datasets, statistical analysis systems and software, and inference and modeling methods will all be examined as part of the project, McPhail said.
Expression Analysis will use Affy’s MAS 5.0 software to analyze the data, but “we may use other methods as well,” McPhail said. While the company has two years’ experience processing Affymetrix data, “the linkage mechanisms are not something we’ve worked on in the past,” he noted, so Expression Analysis is turning to its sister company, regulatory informatics firm Constella Group, to handle the integration between microarray data and other clinical information.
Initially, the project will follow CDER’s current guidance recommendations for regulatory submissions in electronic format, with the goal of identifying areas that need to be modified or redefined. This guidance stipulates that datasets be submitted as a SAS transport file of less than 25 MB per file, with data variable names of no more than eight characters, data elements defined in data definition tables, and variable names and codes consistent across studies.
The submitted array data will include raw data files from image analysis. In addition, a summary report will be provided to describe normalization, data processing, and statistical analysis steps. It is expected that these guidelines will be extended to improve compatibility with microarray data as the project progresses.
The FDA’s database activities are not without precedent. A project spearheaded by the International Life Sciences Institute consortium and the European Bioinformatics Institute has been developing a centralized, public gene expression database for over a year. It is built on the EBI’s ArrayExpress gene expression database, with the intention of linking toxicogenomics data from multiple platforms. Data input is currently ongoing, and the complete database is expected to come online by the first quarter of 2004.
“The intent of the ILSI effort was to establish some public offering that could be helpful in developing standards,” said Pfizer’s Mattes, who is on the ILSI database working group.
Building on the MIAME (minimum information about a microarray experiment) guidelines, the ILSI/EBI project has drafted a revised version of the standard called MIAME/Tox that aims to establish some consensus on the minimal descriptors for array-based toxicogenomics experiments (available at http://www.ilsi.org/committees/hesi/genomics/MIAME1.1ToxCircDRAFT-rev3.DOC).
Judging by the near-universal acceptance of the MIAME standard in the microarray world, it’s likely that MIAME/Tox will gain broad support within the toxicogenomics community. However, it is still in draft form, and has not been endorsed by anyone yet, least of all the FDA. CDER’s Office of Information Management coordinates all of its standardization efforts, but according to Mattes, “there needs to be some communication between that group and anything going on in terms of a toxicogenomics database.”
Indeed, the reigning CDISC-based guidance at CDER poses a number of differences from the proposed MIAME/Tox standard. MIAME/Tox proposes a more restrictive vocabulary, for example, with a field proposed for each clinical chemistry test. MIAME/Tox also collects information on in vitro experiments, while the standing CDER guidelines don’t require it, and MIAME/Tox does not collect information on drug plasma levels, whereas this is currently done under the CDER guidelines.
But MIAME — along with its accompanying data format, MAGE — is only the first piece in a much larger set of standards that need to be developed for a fully functional toxicogenomics data submission platform.
In addition to a dearth of standards for experimental design, normalization, and a “universal” RNA, “there is no standard yet for analysis,” said Mattes. “So, if somebody says, ‘I’ve identified the regulated transcripts after this particular treatment,’ what’s the best way [to verify that analysis]? It’s a huge question.”
While the ILSI database project initially set out to address these standardization issues, Mattes said the group is far from a solution. “We have discussed and compared analysis, but resolved them? That’s a definite no,” he said.
Perhaps, a Meeting?
The ILSI/EBI group has made some headway into the very issues that FDA plans to address with its own database, but there has been no formal involvement between the two groups so far, Mattes said. However, he added, “This may be the time for it. I’m sure, coming out of the [subcommittee] meeting, it would be a time when FDA would be interested in doing that, and I know we would be too,” he said.
McPhail said that it is early in the process.“The agency is just trying to get [its] arms around format, content, and context at this point and time, so it’s probably too soon to tell what impact this will have on the future of microarray testing in support of INDs and NDAs,” he said.