Skip to main content
Premium Trial:

Request an Annual Quote

US National Cancer Institute Stocks Up Arsenal Of Bioinformatics Applications


BETHESDA, Md.--Scientists at the National Cancer Institute (NCI), the largest of the 17 biomedical research centers operated by the US National Institutes of Health, have spent the past decade crafting a bioinformatics program to facilitate the institute's efforts to discover cancer cures.

Since 1990, NCI has tested more than 70,000 compounds for activity against 60 cancer cell lines. At least 5 have entered human trials. More than 460,000 other compounds await testing. Along the way, NCI has developed an arsenal of diverse genomics-based drug discovery tools.

John Weinstein, head of NCI's Bioinformatic and Biophysical Pharmacology Group, is helping to outfit the institute for the postgenomic era. "We are redefining our sense of what the kingdom of life is all about," he told BioInform. "We're doing it in terms of individual genes and gene products, developing inventory that will then be turned into sets of pathways and networks for signaling and metabolism."

The first bioinformatics weapon in NCI's armory was Compare, software developed in the 1980's by the late Ken Paull, a bioinformatics pioneer in the institute's Developmental Therapeutics Program. Compare takes a chosen compound and finds others in NCI's database that are most closely related, based on the pattern of their activity against the 60 cancer cell lines. Researchers found that Compare tracks down structurally similar compounds. "It is very simple to use, is conceptually straightforward, and has produced numerous examples of biologically validated predictions," Ed Sausville, chief of the Developmental Therapeutics Program, told BioInform.

Refinements to Compare have allowed researchers to use expression patterns of important individual proteins from cancer cells to identify new anticancer drugs. For example, in 1997 NCI's Susan Bates measured cell line mRNA expression for two genes, erbB2 and EGF, known to be overexpressed in certain virulent cancers, especially breast cancers. Bates then put Compare to the test. The program selected 25 compounds from NCI's drug database. Testing showed that 14 were true inhibitors of erbB2 or EGF gene expression. Several of the EGF-inhibiting compounds are now in early stages of development as potential breast cancer drugs.

Anyone may access the NCI drug database and submit information for a Compare search at

Compare could also help advance pharmacogenomics, the emerging science of tailoring drug therapy to individual patients based on their genetic profile. In 1996 NCI's Han-Mo Koo set out to discover if anticancer drug potency depended on the presence of the ras oncogene. He identified cancer cell lines with ras mutations, then used Compare to find compounds that seemed to work selectively against them (and not against cells with wild-type ras). One drug that stood out was chemotherapy agent ARA-C. Will cancer patients whose tumors bear ras mutations benefit from ARA-C chemotherapy? Clinical trial results await publication, but NCI is optimistic.

Thus, Compare could help maximize the use of existing drugs, as well as identify new ones. "We know that 25 percent of breast cancer patients with a certain type subcategory will respond to a particular therapy, 30 percent to another therapy," said Weinstein. "We'd really like to know what that means, in molecular terms, and how to choose patients and their therapies more effectively."

A, S, and T databases

Compare's success launched Weinstein's group on a more ambitious trajectory. It now looks at drug discovery in terms of three databases--designated A, S, and T--that are the foundation for new and evolving bioinformatics approaches. A, named for "activity," contains the relative anticancer cell line activity of tested compounds. S, "structure," tags compounds according to structural motifs. The T database contains the pattern of molecular targets in each of the 60 cell lines.

"There are two ways of interrogating such databases," explained Weinstein. "From the outside in, looking for patterns, datamining. Or inside out, from particular genes, starting from a favorite gene, favorite protein, favorite drug, and asking about its relationships and its context. We're trying to develop techniques for both."

Constructing T was the biggest challenge, according to Weinstein. Cancer cell line targets could be any interesting molecular signature--protein, RNA, DNA. "No one had ever tried to develop a database across such a disparate set of cell types," he remarked.

To create useful proteomic profiles of cancer cells, Weinstein turned to collaborators both inside and outside NCI. They first used 2-D gel electrophoresis to generate protein fingerprints, but because protein expression gives an incomplete snapshot of cancer cells, NCI sought other methods. Together with Stanford, NCI created an mRNA expression database for the cancer cell lines, using an array system later marketed by Synteni. Affymetrix chips are also used.

Cancer cells have other fingerprints. For example, they often show cytogenetic abnormalities, like gene translocations. NCI's Tim Kirsch is building a database of such abnormalities using spectral karyotyping. Others are documenting gene amplification and loss, compiled using comparative genome hybridization microarrays.

Discovery software

Once they had built the A, S, and T databases, Weinstein's group wanted to do more than just compare drugs and targets one by one. For multidimensional kinds of analyses, the group developed Discovery, software that integrates the three databases and displays them in novel ways suited to human pattern recognition. Relationships appear on a grid and data are displayed as tiny squares of different colors. For example, anticancer compounds can be arrayed against targets, with the color red indicating drug activity and blue signifying drug resistance. A glance at the color pattern can suggest new hypotheses for the anticancer mechanism of various classes of drugs because compounds of similar action mechanisms appear next to each other on the matrix.

Discovery made that crucial feature possible using a clustering algorithm called ClusCor that displays compounds in a branched tree modeled after the familiar genetic trees showing organisms in family order according to their evolutionary history. Different classes of anticancer compounds--for example, alkylating agents--cluster together perfectly. Unclassified compounds in the tree can be tentatively labeled according to the particular twig they inhabit. And cancer protein targets, for susceptibility or resistance to drugs, can be identified according to color in the matrix, for later biological investigation.

NCI's Tim Myers used Compare and Discovery together to create a display of many well known anticancer drugs, set against various indicators of p53 pathway status in the cell lines. Color patterns clearly showed that the activity of most common anticancer drugs in NCI's assay is linked to normal p53 gene expression. Because the p53 gene is mutated in more than half of all cancers, and such mutations may, in some cases, defeat chemotherapy, NCI looked for drugs that were effective regardless of a cancer's p53 state. Taxol is one example. NCI found that a class of drugs called ellipticiniums is another. These links are now under investigation.

Revolutionary approach

This general approach to biology--examining broad patterns in order to select individual drugs, proteins, or genes to study--is a sharp break with past practices. Biologists traditionally sought a deep understanding of individual molecular agents, which they hoped would reveal general principles. The work of Weinstein's group goes in the other direction, from the general to the particular. "It's very important that the two approaches be made use of synergistically," he said.

Still, Weinstein has faced open skepticism from some colleagues who deride his "fishing expeditions." That kind of talk is diminishing, he said. "Now there's so much interest in, and excitement about, the notion of large aggregates of genes and profiling large aggregates of genes that it's less of a problem than it was," Weinstein remarked. In a recent letter to Science, he took on the skeptics: "If one is going to fish, it is best to do so in teeming waters with the finest equipment and flawless technique."

--Ken Garber

Filed under

The Scan

Lung Cancer Response to Checkpoint Inhibitors Reflected in Circulating Tumor DNA

In non-small cell lung cancer patients, researchers find in JCO Precision Oncology that survival benefits after immune checkpoint blockade coincide with a dip in ctDNA levels.

Study Reviews Family, Provider Responses to Rapid Whole-Genome Sequencing Follow-up

Investigators identified in the European Journal of Human Genetics variable follow-up practices after rapid whole-genome sequencing.

BMI-Related Variants Show Age-Related Stability in UK Biobank Participants

Researchers followed body mass index variant stability with genomic structural equation modeling and genome-wide association studies of 40- to 72-year olds in PLOS Genetics.

Genome Sequences Reveal Range Mutations in Induced Pluripotent Stem Cells

Researchers in Nature Genetics detect somatic mutation variation across iPSCs generated from blood or skin fibroblast cell sources, along with selection for BCOR gene mutations.