ROCKVILLE, Md.--Last month the National Cancer Institute (NCI) and Vice-President Gore launched the new Cancer Genome Anatomy Project (CGAP) web site, http://www.ncbi.nlm.nih.gov/ncicgap, where scientists can rapidly access project data. The site is one example of how bioinformatics will play a key role in CGAP, which was unveiled by NCI earlier in the year.
Part of NCI's five-pronged strategy to combat cancer, CGAP has two objectives: to stimulate the discovery of molecular changes that lead to and are brought on by cancer, and to evaluate the clinical utility of the finds. In order to achieve the goals, the project will promote the dissemination of technologies that can read and interpret the disease's molecular features.
According to Kenneth Katz of the National Center for Biotechnology Information (NCBI)--which, like NCI, is a division of the National Institutes of Health--bioinformatics will play three key roles in CGAP: data organization, quality control, and analysis. NCBIis managing the CGAPdatabase and maintaining the web site.
"The first is just keeping track of all the data that we're generating, which is crucial," he told BioInform. "In the long run we need to analyze this data, and the more information we have and the better organized it is, the more information we're going to get out of it." For example, Katz estimated that 400,000-800,000 DNA sequences per year can be obtained from several hundred tissue libraries. Keeping track of the sequences, the methods used to generate them, the tissue they came from, and the histological state of the tissue is a formidable task. Without bioinformatics tools to organize and access the data, they would remain virtually useless, he observed.
The ability to screen the quality of the data while they are being generated is another key role for bioinformatics. "Specifically with se quence data," Katz continued, "the role of the bioinformatics tools is analyzing the data as they are gen erated so that we can know when we have good data or when we have bad data. That's sort of a quality control. Bioinformatics is used very crucially at that level." Among the factors to be screened for at this stage are suspicious data sequences such as bacterial or viral DNA, repetitive sequences, and mitochondrial DNA, Katz said. Another goal is to determine the total number of sequences generated and, especially, the number of new sequences generated, so re searchers can direct their efforts toward the most promising material.
The final role is key in guiding research efforts after a database of DNA sequences is assembled. "Once you've generated the data and you've kept all the information in a well organized manner that relates to that data, now you develop the bioinformatics tools to analyze the data," Katz explained. "Ultimately, as an individual researcher, it helps you target your research and your lab. Hopefully, it saves your time and makes your time and money better directed and more well spent in generating the data that we all want."
At this point bioinformatics tools should help researchers narrow down the list of genes found in a specific tissue to those that are either turned on or off as cells change from normal to precancerous to cancerous stages. Those genes would be targeted for further research. According to Katz, bioinformatics will be useful in developing the technological tools to screen tens of thousands of genes at once for signs of cancer development. The discipline will also provide better, faster algorithms for comparing new DNA sequences to known genes and for determining the biological function of the sequences.
The CGAP web site will feature a program, containing a catalog of DNA sequences from normal, precancerous, and cancerous cells, that will allow researchers to identify genes that are activated or deactivated during cancer development. Known as the Digital Differential Display, the program is still under development but should be available early this month, Katz said. Meanwhile, a preview is posted at the CGAP web site.
Commenting on the function of the web site, Robert Strausberg of the National Cancer Institute said, "What we want to do is to provide more than just the raw DNA sequences to the community. It's important to also have analysis tools for you to look at the information and make it more meaningful. So the important thing about the web site is that it is going to have informatics tools, analysis tools that will be available to everybody."
"For example, one might want to ask what are the most prominent expressed genes in a particular cDNA library from a particular tissue at a particular stage of cancer," he continued. "That kind of information should be available. The other thing that we'd like to do is to be able to look at different cDNA libraries that come from a normal precursor to a cancer cell and then a precancerous and a full-blown cancer cell. We'd like to be able to look and compare, in fact, what are the genes, what are the significant differences in expression levels."
The information collected by CGAP will be available to all researchers, regardless of their affiliation, Strausberg added, and may be used in the development of patentable inventions.
--David M. Lawrence