At A Glance
- Weida Tong
- Director, Center for Toxicoinformatics, FDA National Center for Toxicological Research, June 2002 - present
- FDA NCTR, Researcher. 1996 - present
- Researcher, Department of Chemistry, University of Missouri-St. Louis, 1990 - 1996
- PhD, Polymer Chemistry, Fudan University, China
After earning his PhD from Fudan University in China, Weida Tong moved to Little Rock, Ark., in 1996 to work for the FDA’s National Center for Toxicological Research, developing a database of endocrine disruptors.
Seven years later, Tong is still at the NCTR, and in June 2002 was named the director of the new Center for Toxicoinformatics, where his first task has been to develop a way to help scientists handle and interpret the center’s growing body of toxicology-related microarray data.
Tong and his colleagues recently presented on their solution, ArrayTrack, at a poster in the Beyond Genome Conference June 16, and have submitted an article about ArrayTrack for publication.
BioArray News recently spoke to Tong about this work.
Can you tell me a little bit about the FDA National Center for Toxicological Research and how you came to the Center for Toxicoinformatics?
NCTR is the only center in the FDA that only foc-uses on research without the regulatory responsibility. We provide advice to other regulatory centers for the regulatory mission, in any case, and we have been doing traditional toxicology research for many, many years and mainly using the animal models and other in vitro, or in vivo techniques. Now everybody’s involved [with] the ‘omics technology, and it’s not surprising [that] our center also started to [become] involved [with] that type of research.
About a year and a half ago, we established five centers of excellence at the NCTR. One center, called the Center for Functional Genomics, was set up as a core facility to support microarray experiments. We have the Center for Structural Genomics; we have the Center for Hepatotoxicity; we also have the Center for Toxicoinformatics — that’s my center. We also have a Center for Phototoxicity. The reasons we have set up these centers is that we realize these so called ‘omics technologies heavily rely on the recently advanced high-throughput technology, and these high-throughput experiments generate a huge amount of data. So the informatics-related support becomes more and more important to ensure the success of these areas of research. The primary function of my center — and I moved to the government a year ago, so this center was established June 1 last year — is to provide bioinformatics, chemoinformatics, and computa- tional toxicological support to the researchers at NCTR and beyond to the FDA.
So you started last June to provide bioinformatics and cheminformatics support to researchers. In your poster at Beyond Genome, you said you were developing a “Toxicoinformatics Integrated System,” and that you were developing a microarray database, ArrayTrack, as the first step toward developing this system.
Exactly. Actually, my center provides a ramp in to two areas. We do the research, [and] at the same time provide the support, so ArrayTrack is one of our main [areas of] focus for the support. ArrayTrack was initially built a year ago. We saw the demands for microarray data management, so we developed the ArrayTrack infrastructure to support the microrray infrastructure. Here, particularly, the Center for Functional Genomics requires such support. Of course, in the long run, we also [will] try to integrate the genomics data with the proteomics data and the metabonomics data. We have the proteomics group and the metabonomics group at NCTR. I think they are still developing the technology, and have not gotten into high-production volume at this moment. But the toxicoinformatics integrated system is shooting [in] the long run to try to integrate all these data.
You said that the NCTR uses a lot of microarrays. What is the throughput of microarray data?
When we’re talking microarrays, I think I should be more specific. It’s DNA microarrays. We have the in-house facility developed [for DNA microarrays]. Most research scientists are going to buy the chip from the various companies, such as a Research Genetics, and Affymetrix, and Clontech, and so forth … but we also have a core facility set up and we [are] printing the slides in-house, and the reason we do that [is], we did an estimate, [and found they] cost 10 percent [of what] the commercial chips [do]. So we can do it really cheap. So far, we do not have a lot of data at this moment. The one reason is, we try to overcome a number of technological difficulties at the center, and we are just trying to pick up speed. So far we have about close to 500 to 600 hybridization [experiments] in the database. When [we are at] full speed, more and more data will come in the near future.
From what I understand, ArrayTrack includes not only Microarray DB, which includes the actual microarray data, but “Lib,” a library that mirrors data in public databases that is relevant to microarray experiments. With Lib, how do you select the data that is relevant to microarrays?
First of all, we selected the most popular databases. We read a lot of literature and see the information most people use, and [then] selected GenBank, UniGene, the Gene Ontology, and KEGG. These are very reputable databases often used by the research community, but really what we did is make a mirror database, then reshuffled the information and reorganized it in such a way [to be] more convenient to microarray data analysis.
So in this “Lib” system, you have GeneLib, ProteinLib, and Pathway Lib (for gene, protein, and pathway information), and you are planning to add additional data? If so, what?
We are definitely going to add the ToxicantLib. I cannot emphasize more the importance of this library — the tox data from various chemicals. No matter how much genomics data you generate, without anchoring [it to] phenotype information, it does not tell you too much of the story. The traditional toxicology endpoint is a traditional component in ArrayTrack. We have data from the NCTR carcinogenicity potency database, and also from [an] endocrine-disruptor knowledge base (http://edkb.fda.gov). It also contains over 3,000 chemicals with various in vitro and in vivo assay data. One important mission we have now is chemical structure similarity searching. You look at the chemical structure and compare any chemical in the ToxicantLib [to it]. If information is available for that particular chemical, we can tell [it’s] just like Amazon.com. If [you are interested in these chemicals}, you will like another chemical similar to these chemicals. These are fundamental principals based on chemistry. Two chemicals of similar structures are likely to have similar toxicology. …
Now you link together MicroarrayDB and the Lib with a tool called, well, Tool, which includes algorithms and visualization tools. Can you tell me how this works?
One emphasis in our toolbox is, we will allow the tool to directly interact with the library. The problem is, so far with the analysis tools, you have so many choices even just using the cluster analysis. … I can name 10 different methods where you can do the same hierar-chical cluster analysis. All these methods give you the same results. But all biologists do, when one [of these methods] gives the correct answer, is they look to each cluster and see whether the gene they know is on the list. [This] is a knowledge-dependent approach, even though a fancy algorithm is on the front end. If the person doesn’t have such knowledge, it’s hard to judge which algorithm fits the correct answer. We tried to have some cluster analysis [and] direct a link to the library so you can shorten the interaction between the knowledge and the algorithm.
In the next release [of ArrayTrack] we will have the clustering analysis and self-organizing maps and support vector machines and principal components analysis. All these tools will be [connected to] the library — it’s a more dynamic approach.
Do you allow for use of this information in the database with other outside visualization tools and algorithms?
So far we are looking into the option to integrate with SpotFire’s [software]. We have bought the [enterprise version] of Spotfire, but we have not completed that work yet. We are also very closely [looking at] Bioconductor. That website is basically a public resource, an open resource to allow scientists to deposit their algorithms in this area for microarray analysis.
How do you integrate all of the various parts of ArrayDB?
We have an Oracle back end and a Java front end.
Who has access to ArrayTrack?
ArrayTrack is publically available. There are two websites, one for inside FDA http://weblaunch.nctr.fda. gov/jnlp/arraytrack) and one for outside FDA http://edkb.fda.gov/webstart/arraytrack. [Outside FDA] you can use the library and the tools but cannot use the database, because we do not support a public repository of data. The simple reson for this is we do not have the resources to do it. But we do have a CD and DVD for local installation [of the database.]