Weida Tong, who got his PhD from Fudan University in China, moved to Little Rock, Ark., in 1996 to work for the US Food and Drug Administration’s National Center for Toxicological Research, developing a database of endocrine disruptors.
Seven years later, Tong is still at the NCTR, and in June 2002 was named the director of the new Center for Toxicoinformatics, where his first task has been to develop a way to handle the center’s growing body of toxicology-related microarray data.
Microarray databases are plentiful these days, but the challenge at NCTR is a complex one: Not only do researchers have to store, retrieve, and analyze microarray experiment results; and reference this data against the body of genomics, proteomics, and metabonomics data that others at the center are producing; they also have to be able to access the center’s traditional toxicology databases, including a carcinogenicity potency database, and the endocrine disruptor database.
The solution that Tong and his colleagues, including former BASF scientist Leming Shi, came up with, is called ArrayTrack.
“We saw the demands for microarray data management, so we developed the ArrayTrack infrastructure,” Tong said.
This DNA microarray data management software is Java-based and runs on an Oracle platform. Its structure is like the two hemispheres of the brain: On one side is “Lib,” libraries of genomic, protein, pathway, and other information that mirror data in the public databases relevant to microarray experiments. The other hemisphere is “MicroarrayDB,” which holds the microarray hybridization data input by FDA NCTR researchers in a MIAME (minimal information about a microarray experiment)-compliant form. In the middle is “Tool,” the system’s corpus callosum, which includes visualization tools and normalization, clustering, significance analysis, and classification algorithms that enable viewing and analysis of the data; as well as means for connecting the results of the analysis to information in the libraries.
The Lib component was tough to assemble, Tong said, because he and his colleagues had to figure out what the toxicologists needed and what was relevant to microarrays.
“We saw that most information people use comes from GenBank, UniGene, Gene Ontology, and KEGG,” Tong said. But rather than just construct links to these databases, “really what we did is make a mirror database, then reshuffled the information and reorganized it in such a way as to make it more convenient to microarray data analysis.” After a back-and-forth with the toxicologists that Tong characterized as “quite dynamic,” the team modified Lib to include information the toxicologists wanted.
Now, Tong’s group is planning to add another lobe to the Lib side: a toxicant library that includes data from the carcinogenicity potency and endocrine disruptor databases. “No matter how much genomics data you generate, without anchoring [this to] the phenotype information [it] does not tell you too much of the story,” he said. Researchers will be able to access the endocrine disruptor knowledge base, which includes over 3,000 chemicals with in vitro and in vivo assay data. The group is setting up the system so users can search for structural similarity between a known toxicant and a chemical of interest. “We are just like Amazon.com,” he said. Just as Amazon directs users to similar books and other products to the one they have purchased, the database will direct users to chemicals that are similar in structure to the one being examined in connection with microarray data, showing their effects on gene expression profiles.
While it took work to construct these libraries of microarray-relevant data, the Tool side of the software was “the most difficult part,” Tong said. “You have so many algorithms already on the market; we do not want to reinvent the wheel.”
In the current version, ArrayTrack offers only a scatter plot viewer and a clustering tool, which is designed to interact directly with the libraries of information on genes, proteins, metabonomics, and toxicology. Since people who do cluster analysis often first scan the cluster for genes they know, Tong designed the system so that the cluster results directly link up to the library of known genes and other information. “This direct link to a library can shorten the interaction [time] between the knowledge and the algorithm,” he said.
In the next edition of the software, version 3.01, Tong and colleagues plan to add self-organizing maps, support vector machines, and principal components analysis. They are also working on integrating the system with Spotfire’s visualization software.
Currently, the full database of microarray data is available to FDA personnel only (at http://weblaunch.nctr.fda. gov/jnlp/arraytrack), but the rest of the software, including Tool and the libraries, can be accessed by the public at http://edkb.fda.gov/webstart/arraytrack/.