NEW YORK--Around the US this month, teams of protein-structure-modeling experts scrambled to meet a grant proposal deadline, hoping to be awarded millions of dollars for research that they said will have profound implications for genomics-based drug development.
The US National Institutes of Health is expected to allot $3 million a year for five years to each of as many as six structural genomics pilot programs.
Scientists participating in the New York-based Structural Genomics Initiative, a consortium of researchers from Rockefeller University, Mount Sinai School of Medicine, Weill Medical College of Cornell University, Brookhaven National Laboratory, and the Albert Einstein College of Medicine,
told BioInform that the grant money
would enable them, in collaboration with scientists involved in similar efforts around the globe, to determine within a few years the structures of almost all protein domains in the human genome. Remarked Andrej Sali,
assistant professor at Rockefeller, "In the space of a few years--on the order of one graduate student’s thesis--we’ll go from knowing almost nothing about protein structure to knowing almost all. Because structure is so powerful, the impact is going to be pretty phenomenal."
Going the modeling distance
Rockefeller computational biologist Terry Gaasterland, who, with Sali, is building a public database of protein models (http://pipe.rockefeller.edu/modbase), has created software that turns raw genome sequence data into predicted proteins with assigned functions. For every piece of a protein, Gaasterland said, "we know which pieces of other proteins it matches across all 45 of the completely sequenced organisms and chromosomes." With that information, Gaasterland is able to build families of protein sequence domains.
The closer the identity between two sequences, the better the model that can be built, she explained. By selecting the sequences from each domain that offer the most possibilities for building models, Gaasterland said she and her colleagues should be able to build models for all other sequences in the same group. The aim is to identify enough structures so that every protein sequence that can be translated from a genome is within modeling distance of a structure.
Implications for drug discovery
Sali noted that structural genomics research will make two major contributions to drug discovery. For one, by providing many more protein structures than are currently available, structural genomics will increase scientists’ general understanding of how proteins fold. Even more significantly, Sali said, structural genomics will contribute to creating new strategies for the drug discovery process by determining examples of all protein shapes.
At present, Sali said the New York consortium, which gives priority to disease-related proteins, has solved four structures and has 111 proteins in its pipeline. But those numbers are "tiny in comparison to what will happen once real funding is available," he said. Sali and Gaasterland said the worldwide structural genomics research effort aims to produce models for around 15,000 protein families.
While Sali estimated it will take at least five years to achieve that end, he noted that examples of structures from half of all families, covering 90 percent of all proteins, could be achieved within four years. "The size of protein families is not uniform, and the largest families with the largest numbers of proteins in them will be dealt with early on." Sali added that there will be immediate daily benefits--essentially every time a new structure is determined.
Still, according to Gaasterland, the real key to drug discovery will be using protein structure information to learn more about protein function. And that, she said, will require bioinformatics tools that don’t yet exist.