Most people in the life science research community think of toxicogenomics as a microarray-driven exercise, but a public database currently under development plans to expand the discipline well beyond gene expression — to include proteomics, metabonomics, regulatory network information, and conventional toxicology data — in a resource that will support “systems toxicology.”
The database, a ten-year project called the Chemical Effects in Biological Systems (CEBS) knowledge base, has been under development for two years so far. Coordinated by the National Institute of Environmental Health Sciences’ National Center for Toxicogenomics, the resource has not yet publicly released any data, “but we hope to be able to load data in a big way soon,” said Michael Waters, assistant director for database development at NCT. Version 1.3 of CEBS, which offers a framework for researchers to upload microarray data sets, was released in mid-April.
Waters said the project is farthest along in the area of gene expression: its creators have compiled a “limited amount” of data from standardization experiments that were done through the Toxicogenomics Research Consortium — an extramural program comprised of five academic institutions — as well as from Paradigm Genetics, which is a contractor for the project. Next, Waters said, the plan is to link that expression data with toxicology data captured by the Toxicology Database Management System hosted by the NIEHS National Toxicology Program.
“It’s a microarray database at the present time,” Waters said, “but the goal is much broader.”
Indeed, plans call for the database to incorporate information from multiple species on global gene expression, protein expression, metabolite profiles, and associated chemical-induced effects. In addition, functional pathway and network information will be included, and the plan is to make the resource searchable by gene or protein sequence, compound, structure, toxicity end point, pathology end point, dose, time, and tissue condition. The ultimate goal is to enable Blast-like global queries using sequence data from microarray probes that will return information on genes, gene families, metabolic and toxicological pathways, and phenotypic response information (see p. 4 for a schematic of the CEBS framework).
Waters stressed that this vision is a long-term goal, and that the informatics challenges of making it a reality are formidable. Nevertheless, the project has laid the foundation for the system in the form of SysBio-OM, a comprehensive object model built on the MAGE-OM and MIAPE (formerly Pedro) object models for microarray and proteomics data, respectively, as well as on the caBIO object model from the National Cancer Institute. SysBio-OM was developed by researchers at Science Applications International Corp., a contractor responsible for core database development for CEBS. The model extends MAGE-OM and MIAPE to represent protein expression data, protein-protein interaction data, and metabolomics data, and is publicly available through the CEBS website (http://cebs.niehs.nih.gov/).
In addition, Waters said that the project is using the concept of “phenotypic anchoring” as the basis for integrating the multitude of various data types that CEBS will contain. In this approach, certain biomarkers would act as flags or “anchors” that signal a particular level of phenotypic damage. Researchers could start with the biomarker to explore patterns of gene expression or protein expression, and also to discover parallel, co-expressed markers.
“That allows you to back down in terms of dose and earlier time to see whether events that might have paralleled those changes that you see when you get the toxic outcome are seen earlier, and then one can begin to target proteins that may be potential biomarkers,” Waters said. He added that the phenotype-based data integration concept is “an unconventional approach” that still needs to be verified experimentally, “but in terms of how we would think about integrating, it would be around a global approach to the extent that we can,” he said.
Yet Another Toxicogenomics Database?
CEBS is built on the same computational infrastructure used by the NCI’s Center for Bioinformatics, and has a “sister effort” in the ToxMIAMExpress and ArrayExpress toxicogenomics databases currently under development at the EBI, in partnership with the International Life Sciences Institute’s Genomics Committee. Waters said that NCT is collaborating with the EBI to set up a “dual pipeline” between the two organizations in order to exchange data.
Waters said that CEBS also complements similar federal efforts underway within the United States, such as the ArrayTrack toxicogenomics database used for the Food and Drug Administration’s internal research [BioInform 07-14-03], and the computational toxicology initiative at the Environmental Protection Agency [BioInform 01-12-04]. “Basically, what we’ve tried to do with these groups is set up a set of standards discussions so that our database standards and their database standards will be compatible,” he said.
Data in CEBS will also be “similar” to that found in some commercial databases, Waters said, such as Gene Logic’s ToxExpress database and Iconix’s DrugMatrix, “We anticipate that we’ll have the same type of data that those databases have for multiple chemicals,” he said, “but we hope that, in addition, we’ll be able to provide fairly extensive documentation of the toxic outcomes, both in terms of histopathology and other conventional toxicologic parameters, and to be able to bring together the proteomics and, ideally, the metabonomics.”
Waters said that the commercial offerings are currently well beyond the nascent version of CEBS, “but I think our vision is to go beyond what those databases currently offer, and, perhaps most importantly, make the information public.”
The commercial sector, however, doesn’t anticipate being left in NCT’s dust. William Mattes, senior science director at Gene Logic, said that the company is keeping an eye on the project, and expects that it will be at least two or three years before the first CEBS data becomes available. While the “lofty goals” of the project are admirable, Mattes said, he noted that integrating the variety of data types involved is “problematic,” and Gene Logic does not have plans at this time to incorporate proteomics or metabonomics data into its content offering.
A further challenge for the CEBS project, Mattes said, is its plan to gather gene expression data from multiple microarray platforms. While that will allow NCT to “cast a wider net” to gather more information, Mattes noted that cross-platform integration remains an unsolved problem within the gene expression community, so the QC hurdles of such an effort are considerable.
Even if CEBS were to catch up and gather as much toxicogenomics data as Gene Logic has on hand, Mattes added that he thinks it’s unlikely that the company will meet the same fate as genomic sequence data providers did once a flood of public data came online. The complexity of gene expression data, and the high degree of variation that exists between platforms and even between experiments, “insulates us from the commoditization that occurred with sequence data,” he said. In fact, the company is in discussions with members of the NCT team about possibly releasing some of its data through CEBS. Mattes said that Gene Logic is considering the possibility, but has not yet arrived at a decision.
Slow but Steady
NCT, meanwhile, is taking a “stepwise” approach toward its long-term vision of a systems toxicology resource. The microarray component of the database is relatively mature, and the project is just now implementing a system for capturing proteomics data from 2D gels, Waters said. Metabonomics will present more of a challenge in future efforts, he said, because standards are just beginning to emerge in that area, but NCT is working closely with Paradigm, EPA, and IBM on a project to integrate metabonomics data with other data in CEBS.
It’s likely that the complete systems toxicology picture will take the full ten years slated for the project to come into focus, but Waters said it should be worth the wait. “Just understanding the component parts of the biological system and their function is probably not going to be adequate in toxicology; we’re going to need to stress-test the system,” he said. “I think it will be a partnership between basic research and what you might call applied research, where we’re actually perturbing the systems with drugs and chemicals, and combining those two types of research in an effort that will eventually lead to systems toxicology.”