An Indian government lab has outfitted the countrys fastest parallel supercomputer for bioinformatics research in an attempt to coax university and industry scientists into pursuing highly computation-intensive problems in biology.
Researchers at Indias Center for Development of Advanced Computing in Pune have tailored popular bioinformatics packages, which were originally developed in US universities, to run on its Param10000 supercomputer.
The bioinformatics team at CDAC has also developed parallel genetic algorithms for multiple sequence alignment and protein structure analysis.
Were trying to make available high-end computing resources for bioinformaticists in India, Rajendra Joshi, bioinformatics coordinator at CDAC, told BioInform.
The researchers have already ported two molecular modeling packages AMBER and CHARMM on to Param10000, a parallel machine constructed out of multiple Sun UltraSparc nodes designed for a peak performance of 100 Gflops.
Param was Indias answer to technology embargos. In the mid-1980s, India was denied supercomputers by the US and Japan on the grounds that they would be used for its nuclear and missile programs.
In response, government labs in India, including CDAC, designed parallel supercomputers, procuring off-the-shelf processors and writing codes that distribute problems across multiple processors.
The Param, installed in over 25 universities and labs across India, is now routinely used in meteorology, seismic analysis, oil prospecting, and fluid dynamics.
Were hoping that an opportunity to exploit Param for bioinformatics will encourage scientists here to take up problems they might have shirked earlier, Joshi said.
CDAC may have itself wrested the first research results from bioinformatics on Param ó an insight into mechanisms that underlie trinucleotide repeats associated with Huntingtons disease, a neurodegenerative disorder.
Joshi used a parallelized version of AMBER to simulate the 3D structure of CAG, the trinucleotide repeat associated with Huntingtons disease. While this trinucleotide repeat occurs up to 35 times in the general population, individuals with Huntingtons disease may have up to 121 such repeats.
Structural studies suggest that this trinucleotide sequence is kind of predisposed to repeating itself, said Joshi.
The CAG repeat studies involved simulating the behavior of 16,000 atoms over a nanosecond, a simulation that lasted 96 hours on a 16-processor Param. It could have taken up to eight weeks on conventional single-processor workstations that researchers typically use in India.
Params configuration would depend on the problem to be solved. An eight-node cluster with 32 processors would be good enough for simulations involving protein molecules, DNA-protein complexes, and drug molecules bound to proteins.
The CDAC team is also working on parallel codes for genome sequence analysis. And a parallel genetic algorithm that CDAC bioinformaticist Lourdusamy Anbarasu has developed is qualitatively better at multiple sequence alignment than ClustalW or sequential genetic algorithms, according to Anbarasu.
A special user interface for bioinformatics on Param will allow scientists to work on the system without having to learn parallel processing. For most scientists, it will be business as usual, but a lot faster, Joshi said.