Most proteomics researchers have resigned themselves to accepting 2D-gel electrophoresis as a necessary and low-throughput evil — the painstaking process remains the most popular way to separate complex protein mixtures, despite numerous attempts to improve upon it, or even replace it. Although several software packages are available to automate the comparison of 2D gel profiles, “most of them require a significant amount of user interaction,” said Guang-Zhong Yang, director of the Medical Image Computing Laboratory at Imperial College, London. A typical experiment, he said, might require a technician to spend hours manually lining up protein spots before the software even comes into play — not exactly a lightning-quick approach.
Yang’s team is working on a combination of methods to automate the process and “put 2D gels back onto the roadmap” of high-throughput biology. Two years ago, his lab developed an image-processing algorithm called MIR (multiresolution image registration) to automate the alignment of gel pairs. The algorithm takes only about five seconds to process a typical gel pair on a standard desktop computer, Yang said, but that only addresses part of the problem.
“From then, what you want is to do accurate quantification, because 2D gel electrophoresis is actually an analog process; so if you run the same 2D sample multiple times, you’ll end up with gels of slightly different natures — expression, distortion, background bias will all be slightly different,” Yang said.
Ideally, proteomics researchers would have access to a database of gels run multiple times under similar conditions, slightly different conditions, and even from different labs, “so that we could create what we call the statistical norm of the baseline,” Yang said. “Then, when you have a new tissue sample come in, you can compare that in the statistical sense and you’ll be able to get much more meaningful and accurate results for the true differentiation in terms of expression.”
The problem, he noted, is that the number of pair-wise comparisons required to build such a database is enormous — even with the speed-up provided by the new image processing method. However, recently, Andrew Dowsey, a researcher in Yang’s team, found that it was relatively straightforward to rework the MIR algorithm to run on a distributed computing architecture, which makes the possibility of creating such a repository much more likely.
Using the Condor cluster management software from the University of Washington to harness the CPU cycles of 40 Windows and Unix machines, Yang’s team completed the 3,540 tasks required for the pair-wise comparison of a 60-gel experiment in just over nine minutes, compared to four and a half hours on a single machine.
The parallel approach now makes it possible to begin collecting statistical norms for expression patterns in a centralized repository, but that task isn’t one that Yang is planning to undertake alone. He said he is trying to establish a consortium “to address some of the key, next stages of the issue” and is also in discussions with funding agencies to support a large-scale project to create such a database.
In the meantime, a demo version of the MIR software is freely available at Yang’s “ProTurbo” website for high-throughput proteomics (http://vip.doc.ic.ac.uk/proturbo/index.php), and Yang reports “a lot of hits” from researchers as well as some queries from commercial software firms interested in licensing the technology. However, he said, “I think our priority is to make it freely available to the academic community.”
Right now, Yang said, “Everyone is interested in high-throughput systems, and 2D gel electrophoresis doesn’t really fit the bill for that.” With the right mix of new algorithms and grid technology, Yang said, his lab’s goal is to “restore the image of 2D gel electrophoresis projects to the bioinformatics community.”