“What’s really exciting about the whole field of network alignment now is that we are in a sense where the sequence alignment people were in, say, 1975,” said Trey Ideker, co-developer of PathBlast — a software tool that by his own analogy might be considered the Needleman-Wunsch of the post-sequence era.
In 1975, Ideker said, “there was just enough sequence data that people could think about aligning them and developing better and better algorithms for doing that, and we’re in that same stage now with interaction networks.”
With just a handful of interaction networks currently available, “there’s enough data that we can conceive of this notion and explore how the algorithmics work,” he said.
“So if you start in 1970 or 1975 with sequence alignment and then track all the developments in bioinformatics that happened over the next 10 years, you get a very good impression of what all the required developments will be for network alignment.”
Needleman-Wunsch, the first dynamic programming algorithm for sequence comparison, was published in 1970, followed by Smith-Waterman in 1981, and then Blast in 1990 — all improvements in either the speed or the accuracy of the original concept.
The next step after pairwise alignment, according to Ideker, is multiple alignment across three or more species, “and with our recent PNAS paper we at least show first principle of that.”
In that paper [Proc Natl Acad Sci USA. 2005 Feb 8;102(6):1974-9], Ideker, a bioengineering professor at the University of California, San Diego, and his colleagues used PathBlast to compare the protein-protein interaction networks of Caenorhabditis elegans, Drosophila melanogaster, and Saccharomyces cerevisiae. The alignment revealed 71 regions that were conserved across worm, fly, and yeast, pointing the way toward novel functional annotations for several thousand proteins in the three species.
The paper generated a great deal of interest in PathBlast, Ideker said, adding that the response came as a bit of a surprise because it’s actually the third paper on the method that the developers have published over the last year. “I can only think the pump was primed or something by those other papers. When they were coming out a year ago, I think the idea was still sort of foreign to people,” he said.
PathBlast compares two or more networks by aligning proteins, one by one, along a pathway. The method first assesses the sequence similarity between a protein and its potential ortholog, and then scores the quality of the protein interactions in order to find the optimal alignment between two networks.
Ideker cautioned that there is plenty of room for improvement in the method, however. “In the case of multiple network alignment, we’d be hard pressed to do five or six species in the same way we did three species,” he said. “So there’s going to have to be more algorithmic advances there to enable us to do that.”
In addition, he said, largely due to the paucity of interaction data, there’s no equivalent of a PAM (point-adjusted mutation) or BLOSUM (block substitution matrix) substitution matrix for network alignment, which would score potential alignments based on better knowledge of how networks evolve. “So what we really need now is a lot of network data spanning many species that will let us formulate how these networks are really evolving, and how can that information be used in the alignment algorithm itself,” he said.
But despite its current limitations, Ideker said that PathBlast is still an effective tool for several research problems. One example, as the recent PNAS paper illustrates, is in predicting protein functions — or even novel protein interactions — by comparing networks across multiple species.
PathBlast is also an effective noise-filtering tool for interaction networks. Unlike sequence data, which may contain only one error per 1,000 base pairs or so, the error rate for interaction data can get as high as 50 percent, Ideker said, “meaning that every other interaction you look at is probably false.”
However, he said, “If you see a region aligned between networks, you sort of have more confidence that those interactions and proteins are really occurring in the cell for some functional purpose.”
Ideker and his colleagues have released a web-based PathBlast query tool (at http://www.pathblast.org/), where researchers can compare a single pathway or complex against interaction networks for seven species.
By early summer, Ideker said that a downloadable version of PathBlast should be available as a plug-in for the Cytoscape software package so that users can compare whole networks against each other.
Currently, he said, a web-based query takes “under 10 seconds,” while a whole-versus-whole network comparison can take several hours.