Platform Computing signed on one of the industry’s highest-profile compute farm customers this week in a deal that places Platform LSF and Platform Analyzer software at the heart of Incyte Genomics’ 1,200-CPU Linux and Unix cluster.
The project is significant for Platform, according to Yury Rozenman, director of life sciences business development, because it wraps up the company’s “portfolio of the largest genome centers.” Platform already boasts the Wellcome Trust Sanger Institute, the Whitehead Institute, the European Bioinformatics Institute, and Celera Genomics among its customers.
Stu Jackson, director of bioinformatics at Incyte Genomics, told BioInform that the company’s current load management system, a package Incyte’s bioinformatics team built from scratch three years ago to run dedicated applications, will “coexist” with Platform LSF. While Incyte’s home-grown system is “an extremely powerful package, and way ahead of its time,” Jackson said that as the company’s research needs have changed in step with its move toward drug development, greater flexibility had to be added to the compute farm’s capabilities.
As an example, Jackson cited the analysis Incyte is undertaking for its new LifeSeq Foundation product, a genome-centric view of EST and expression data. “The kind of algorithms that you use in bioinformatics to work with genomic data have completely different performance profiles than the ones that you use for EST data,” Jackson said. “We’re looking at dealing with rather larger volumes of data … than before.”
In addition to the 1,200-CPU sequence analysis cluster, Jackson said an even more vital application area for Platform LSF would be in the company’s full-length gene pipeline, a 90-CPU Alpha server environment that previously lacked a multi-computer scheduling system. Instead, users were manually scheduling jobs after hours, Jackson said.
Before LSF, the full-length gene editors and data miners in the pipeline “stood up and yelled around the cubes to find out which machine was the least loaded,” Jackson said. “Needless to say,” he added, as the full-length-gene project has grown over the last year, “that sort of load balancing stopped working effectively.”
First Step: Performance Assessment
Incyte first tested the performance of its current cluster environment with Platform Analyzer software, which collects baseline utilization statistics such as system load, CPU usage, pending jobs by user, and job pending time. Rozenman said the tool is a useful way for Platform to make recommendations for projects, but many customers choose to keep it as part of their compute farm environment to help them track projects and reconfigure resources effectively.
Rozenman said that while a number of pharmaceutical companies use Platform Analyzer, Incyte is the first genomics firm to opt for the tool. With the downturn in the economy, “people don’t want to spend money on new hardware so it’s a great time to make the most of their existing computational environments,” said Rozenman. In addition, he said, “as business models change in the industry, companies will need more flexibility in compute resources to meet the challenges of new applications and functions.”
While Jackson was unable to disclose particular productivity numbers, he said that Incyte’s compute farm utilization under LSF and Analyzer is “significantly higher” than it was before.
Installation of LSF on the full-length environment was completed at the end of 2001, Jackson said, and the implementation of the 1,200-CPU systems is just beginning now.
Jackson said that Incyte has no immediate plans to expand its compute farm, largely due to the performance gains provided by the new management software. “We can get more out of the same hardware. We’re able to use the machines we have far more efficiently now,” he said.