As the head of the Donald Danforth Plant Science Center’s Laboratory of Computational Genomics, Jeff Skolnick is finding plenty of work for one of the most powerful computer systems in the world: He’s well on his way to predicting the structure for every protein in an entire genome.
Capable of up to 335 gigaflops, the Danforth Center’s Beowulf cluster of 1040 Intel Pentium III processors is the largest of its kind devoted to plant research and ranks 41st in the world in raw computing power. The center estimates that the $2 million cluster built by Western Scientific is roughly equivalent to a $25 million Cray T3E-1200/504.
The system is getting its first workout folding the Mycoplasm genitalium genome, which Skolnick called “the hydrogen atom of all genomes” because it only has 480 proteins of which 85 are below 150 residues.
The Danforth researchers have developed threading methods for protein folding, which align sequence onto structure, but the team has found that its ab initio folding approaches for predicting low-resolution structures work well for a substantial portion of small, single domain proteins. Skolnick said that the researchers have been able to identify active sites in proteins using the low-resolution models and have even been able to dock ligands to the structures in a step toward predicting their biochemical function.
The center is on track to produce models for about 40 of the structures in M. genitalium using ab initio methods, as opposed to 20 using threading.
“One of the problems with threading is even if it gets the global topology right, the alignment may be wrong,” Skolnick said. In response, the Danforth researchers developed a technique called generalized comparative modeling, which is designed to work on pairs of proteins whose sequence identity is below about 30 percent and helps refine the threaded structures.
While Skolnick considers the M. genitalium genome “almost done,” he noted that the end point of the project would actually come at the 60 percent mark because “a third of the proteins are membrane proteins and they’re still hopeless.”
M. genitalium will be followed by the yeast, worm, fly, and human genomes. Skolnick said he expects to be well into the human genome by next year.
Also in the works are methods to predict macromolecular interactions and map the proteins into pathways as well as better algorithms for folding those troublesome membrane proteins.
“The goal is to use predicted structure to predict biochemical function and to be able to do it not just for one or two proteins but for entire genomes,” said Skolnick.