Virginia Bioinformatics Institute is partnering with Convey Computer to develop new reconfigurable computing platforms based on field-programmable gate arrays that will be used for life sciences and biomedical research at the institution.
VBI intends to use a $1.3 million grant from the National Science Foundation’s Strategic Technologies for Cyberinfrastructure program to design and build high-performance scalable clusters using Convey's FPGA-based systems and complementary technologies.
According to the grant abstract, VBI and Convey intend to "create an extensible FPGA-based cluster as an expansion to existing FPGA modules that is balanced with a data storage system to address the needs that are typical of life/medical science applications."
When it is up and running, the system will allow developers to track efficiency and usage patterns, and create tools that will be immediately deployed via web for research use, VBI said.
VBI's new cluster will include 5 or 6 machines, adding to the 250 servers already installed at the institute, comprising FPGA-based systems, standard Dell processors, and graphics processing units.
The new hardware will add to a "hybrid-core" computing cluster, dubbed Shadowfax, consisting of three machines from Convey, which VBI installed last year to analyze data from the National Human Genome Research Institute's 1000 Genomes Project and carry out other tasks including text-mining projects and efforts to simulate the spread of infectious disease through populations (BI 05/28/2010).
Among its features, the system provides a total of eight compute nodes where each node has a dual-socket Intel Xeon X5670 2.9 GHz 6-core processor, a dual SAS 146 gigabyte SAS drive, and 48 gigabyte DDR 1333MHz of random access memory. It also includes a PAS7 parallel file system storage system from Panasas.
That system is part of VBI's supercomputing program in which participants pool funds to purchase nodes within the supercomputing cluster for their research needs. In addition to hardware, VBI provides network capacity, system software, clustering software, and temporary storage for use during computational runs, among other features.
It includes algorithms for DNA sequence alignment for example, that are optimized and translated into code that's loadable onto the FPGAs at runtime to accelerate the applications that use them.
Since it was deployed, VBI has used the infrastructure for several research projects including decision and policy informatics, microsatellite analysis, gene annotation, and text data mining.
This year, VBI will be building its cluster with a newer version of Convey's platform dubbed HC-1EX that uses improved FPGAs providing more capacity and performance, Bruce Toal, Convey's co-founder and CEO, told BioInform.
The system includes three times the number of usable logic gates as HC-1 providing additional hardware for users to implement more functions and more complex portions of their applications in the coprocessor.
Harold Garner, VBI's executive director, told BioInform that the hardware will support the increasing quantities of data generated in house, as well as information coming into the institute from the National Cancer Institute's Cancer Genome Atlas, the 1000 Genomes Project, and BGI.
Among other projects, VBI is working on analyzing repetitive portions of DNA, of which there are 2 million in the human genome, Garner said, explaining that researchers are exploring new genomes that are being sequenced in the 1,000 Genomes Project and the Cancer Genome Atlas to discover these sequences as well as to identify any differences between cancerous and normal samples.
These kinds of analyses are computationally intensive, not to mention data intensive, Garner said, estimating that the Cancer Genome Atlas and 1,000 Genomes Project generate roughly 10 petabytes of data per day.
Furthermore, "It turns out that the kinds of computer codes that we are using in the life and medical sciences to analyze data … are fundamentally different from the codes that you would normally see in engineering and physics" and require specialized coding techniques, he explained.
Since the data is only going to increase, it became necessary to increase VBI's compute power, Garner said, adding that the institute was looking for hardware that could work well in a cluster, have a high calculation density, and a large memory footprint.
"The Convey hybrid core machines are really paradigm-shifting hardware and software that allows us to change the scaling laws for how many computers we are putting [in] to do the work," he said.
VBI and Convey "will be taking a number of computer codes that are used heavily in biology and medicine and transferring them to the new generation of Convey hybrid core machines and then optimizing their performance and studying their speed relative to normal processors," he explained.
He said that once the codes are optimized on the hardware, they can run 100-200 times faster on Convey's systems than on standard microprocessor-based servers.
The new machines will be added to the VBI's current infrastructure sometime after Sept. 1.
Garner also said that VBI plans to double its storage capacity, which currently includes disk arrays from Data Direct Networks, Panasas, and a recently installed Sun Oracle storage tech array that provides up to 50 petabytes of storage.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com