Convey Computer said this week that an exact-match short-read aligner developed by students at Iowa State University and run on Convey’s HC-1 hardware has won first place in a hardware/software design challenge sponsored by the Association for Computing Machinery and the Institute of Electrical and Electronics Engineers.
The winners presented their method, a hash algorithm dubbed Shepard, at the 10th ACM/IEEE International Conference on Formal Methods and Models for Codesign, or MemoCODE, conference, which was held in Arlington, Va., July 16-17.
According to Convey, the students’ solution, which was run on the company’s field programmable gate array technology, was more than 24 times faster than the second place algorithm in the contest, a Burrows-Wheeler/hash hybrid running on a 12-core Intel system developed by researchers at the High Performance Computing Laboratory at the Institute for Research in Fundamental Science in Iran.
The ISU team also ran its algorithm on an NVidia graphics processing unit-based infrastructure and had the fastest implementation on that system as well, George Vacek, Convey’s director of life sciences, told BioInform.
Shepard contains two components: software that preprocesses a reference genome into a hash table and a hardware pipeline, which runs on the Convey HC-1, for performing fast lookups.
As noted in a paper describing the algorithm, Shepard's runtime was 895 milliseconds for processing 284,881,619 short-read sequences in hardware, though this time "omits the one-time costs associated with creating and loading a 22.5 GB hash table into memory and loading a 780 MB reference genome to memory." It also does not include the time it took to load the reads from disk into memory.
The paper adds that on a "commodity server" with 48 GB of available memory, the creation of the hash table took 147 minutes.
The MemoCODE conference focuses on methods and techniques for designing improved hardware and software systems that address timing, power, costs, and reliability issues.
Each year, the conference holds the MemoCODE challenge, which poses a specific design problem and invites teams from around the world to build hardware and software systems to address it.
This year, the organizers selected a bioinformatics-related challenge — aligning genomic sequences to a reference dataset.
For the challenge, participants were expected to efficiently map millions of 100 base pair short read sequences to a reference human genome of 3.1 million base pairs from the 1000 Genomes Project using string matching approaches.
According to a whitepaper describing the MemoCODE challenge, a total of nine teams from six institutions submitted working solutions. The challenge began on March 1 and closed a month later. Entries were judged on runtime and cost of the hardware used to implement the algorithms.
The ISU team, led by Phillip Jones and Joseph Zambreno, divided into two teams with each using a different coprocessor technology — Jones’ group selected the Convey HC-1 while Zambreno’s team used graphics processing units.
However, the FPGAs proved to be the better option for the alignment problem. “This particular challenge had a big memory bandwidth issue — and having local memory was vital,” Zambreno explained in a statement. “By their nature, GPUs are fairly limited to how much on-chip, easily accessible memory is available.”
With Convey’s infrastructure, “we won because we were able to get 80 gigabytes of memory bandwidth,” Kevin Townsend, one of the graduate students on Convey’s team said in a statement.
Furthermore, “the Convey system makes it easy to develop algorithms because of its design and toolset,” Jones said in a statement. “And from a user’s point of view, the Convey system simplifies how to get access to memory.”
The ISU team expects its algorithm to find use in domains outside bioinformatics such as data mining, social graphs, and search optimization. The developers could not be reached for further comment.
Vacek told BioInform that ISU has been a Convey customer since 2010.