For the past year and a half, the Mount Desert Island Biological Laboratory in Salisbury Cove, Maine, has been building a Comparative Toxicogenomics Database linking information about environmental chemicals and diverse sets of protein and gene sequence data, focusing largely on agents affecting human health.
The lab is “looking at the genes we know are involved in chemical responses, and looking at the sequences of these genes and proteins in different vertebrates and invertebrates, and trying to correlate toxicity with function, with sequence,” said Carolyn Mattingly, director of bioinformatics for MDIBL.
Currently, there is no other database that gathers data about genes and proteins that are particularly important to toxicology, said Mattingly. The CTD contains data on “a lot of drugs,” as well as drug precursors, she said. There are now “well over 57,000 chemicals” in the database, as well as information from “almost 50,000” species of animals, she added.
The CTD is intended for molecular biologists interested in environmental health, toxicologists doing molecular research, and people involved in proteomics and genomics research, Mattingly said. “We have [also] had several pharmaceutical and biotech companies check in,” but the MDIBL has not yet had “extensive” discussions with them, she added.
For the most part, the CTD’s data sources — published literature and sequences from NCBI, Genbank and PubMed — serve as repositories of data that do not allow users to pose complex queries, said Mattingly.
“It’s impossible at this point to go to any resource and say, ‘I’m interested in this particular chemical and I want to know the genes that are affected by it,’ said Mattingly. Users can ask the CTD questions that contain gene names, chemical names, and gene ontology, along with “a number of other ontologies” often used in biological databases.
For example, the CTD can answer the query: “Which genes that are affected by this chemical are involved in apoptosis?” she said. Or even: “If there is an effect between a protein and a chemical, is the protein binding to the chemical? What’s happening subsequent to that? Is it activating transcription of another gene, which spirals and ends up activating a whole expression pathway?”
A CTD query comparing the differential toxicity of dioxin among species, for example, may show different [protein] binding properties among different organisms, Mattingly said. With aligned and annotated protein and gene sequences, users can more easily determine molecular effects responsible for toxicity, she said.
A prototype of the Comparative Toxicogenomics Database became publicly available in November, and it was publicized last week. The MDIBL hopes to add as much about each of its growing list of compounds as it can. At this point, “we’re basically just encouraging the community to give us feedback and make it as valuable as possible,” Mattingly said.
The CTD is publicly available at ctd.mdibl.org, and was funded by a grant from the US National Institute of Environmental Health Sciences, one of the National Institutes of Health [see the FY2004 NIH Grants for Pharmacogenomics and Pharmacogenetics by State, as of Jan. 9 in this issue]. The MDIBL welcomes feedback and data sets at [email protected]