The National Institute of General Medical Sciences announced last week that its 10-year Protein Structure Initiative has so far determined the structures of more than 1,000 proteins.
The PSI, which is currently wrapping up its first phase, is a $600 million project launched in 2000 to determine the three-dimensional shapes of 4,000 to 6,000 unique proteins that represent the variety found in organisms ranging from bacteria to humans (see ProteoMonitor 2/13/04). Researchers can use these structures to build computer models of the structures of other proteins with related amino acid sequences.
“The protein structures solved by the PSI are more than a scientific stamp collection,” said John Norvell, director of the PSI at NIGMS. “They will help researchers better understand the function of proteins, predict the shape of unknown proteins, quickly identify targets for drug development, and compare protein structures from normal and diseased tissues.”
The first phase of the PSI focused on developing new tools and automated processes to enable researchers to quickly, cheaply, and reliably determine the shapes of proteins. During that phase, nine PSI pilot centers transformed protein structure determination from a mostly manual process to a highly automated one.
As robotic instruments were developed and implemented to rapidly clone, express, purify, crystallize, and analyze many proteins simultaneously, the time taken to determine the structure of a single protein was cut down from months to days. This has resulted in more and more structures being solved by the PSI centers each year: In 2002, the second year of the project, 109 structures were solved; in the third year 217 were solved; in the fourth year 348 were solved; and now, half-way through the fifth year, the total number of structures solved has surpassed 1,000.
“At this large scale, it would be unthinkable to do all these steps by hand,” said Norvell.
Norvell noted that some of the robotics and automated tools have been refined and are now marketed by companies for general structural biology applications.
The PSI is something different than what has been done in structural biology over the past 50 years in that it is a systematic way of solving the structures of non-redundant proteins that each represent a different class of proteins, Norvell pointed out.
“Almost 30,000 protein structures have been solved over the last 50 years,” said Norvell. “Most of these are highly redundant — there are many copies of the same protein that have been solved. Most of these structures have been done when a protein is well characterized — you know its function and you’re trying to find out more about it in terms of atomic details.”
In contrast, most of the proteins solved by PSI are not well characterized, and PSI proteins are completely non-redundant — they are only solved if there is no other protein of the same class that has been solved before.
“The idea of structural genomics is to use sequence information to put proteins into families — to classify proteins by their families of similar sequences — and then to solve a representative of each of those,” said Norvell. “If there’s already a structure that’s been solved in the group, then we don’t do it. The long-term goal is to get structural information about most of the genes that have been sequenced.”
Jurek Osipiuk, a protein crystallographer at the Structural Biology Center at Argonne National Laboratory, said that the PSI protein databank will save a lot of time when it comes to solving protein structures.
“If you have the structure for one of the proteins [in a particular family of proteins], it is much easier to solve the structure for homologues, for sure,” said Osipiuk. “This will decrease the cost for single-structure crystallography.”
Osipiuk added that the PSI project is useful for functional understanding, even if you are not looking to solve a protein structure.
“Let’s say you have DNA sequence,” said Osipiuk. “You have only one dimension. If you have a protein representative, you can see how it will interact with other proteins, or with nucleic acids or potential drugs.”
Though the focus of the PSI’s pilot phase was to automate the solving of structures and not to analyze structures, some interesting properties have already been observed from preliminary analysis of the first 1,000 structures.
“Many new structures and new folds have been discovered,” said Norvell. “There are a number of structures that are very unusual, including a structure that ties a knot at the end of the amino acid chain.”
Some other proteins are interesting in terms of evolution, Norvell added. They have structures that are very similar to each other in terms of shape, but their sequence similarity is very low.
The next five-year phase, which will begin in July, will focus on determining harder-to-solve protein structures, such as the structures of membrane proteins.
It will be challenging during the second phase to go back to every step along the pipeline from DNA to protein structure and to salvage the losses from each step where there is leakage, said Norvell.
“As we reach for higher-hanging fruit — protein structures that are more complex and harder to solve — we will need to develop additional tools and methods,” he said.
The nine pilot centers participating in the first phase of the PSI are: The Berkeley Structure Genomics Center; the Center for Eukaryotic Structural Genomics; the Joint Center for Structural Genomics; the Midwest Center for Structural Genomics; the New York Structural Genomics Research Consortium; the Northeast Structural Genomics Consortium; the Southeast Collaboratory for Structural Genomics; the Structural Genomics of Pathogenic Protozoa Consortium; and the TB Structural Genomics Consortium.
Centers for the second phase of the PSI project will be announced in July 2005.
More information about the PSI can be found at www.nigms.nih.gov/psi.