Skip to main content
Premium Trial:

Request an Annual Quote

Convey Claims 8.4-fold Speed-up of Short-Read Assembly on Hybrid-Core Architecture


By Uduak Grace Thomas

Convey Computer said this week that its latest software, GraphConstructor, accelerates de novo genome assembly up to 8.4-fold on its hybrid-core architecture.

GraphConstructor speeds up the construction and manipulation of de Bruijn graphs, which are used in short-read genome assembly algorithms such as Velvet and Abyss. The Richardson, Texas-based company said that customers have used the software on its HC-1 and HC-1ex computers to perform short-read assembly up to 8.4 times faster than on standard architectures.

The company's hybrid-core systems combine Intel x86 microprocessors with a coprocessor comprised of field-programmable gate arrays in order to accelerate computationally intensive tasks. GraphConstructor implements the read hashing step of de Bruijn graphs in the coprocessor in order to reduce the memory required for graph construction.

While sequence-alignment algorithms like Smith-Waterman are compute-intensive, memory is often the limiting factor for short-read assembly algorithms, George Vacek, Convey's director of life sciences, explained to BioInform.

De Bruijn-based algorithms create k-mer occurrence hash tables that require random access to memory, which is "challenging" for x86-based systems, especially for very large genomes.

Standard processors work in a "cache line" basis, Vacek said, explaining that although a user may only require one word of memory, the system pulls out eight words that make up the cache line and discards those that aren't used. So, effectively, only an eighth of the total bandwidth is used.

"The computer isn't waiting for the logic gates to cycle; it is waiting for pieces of data to come back from memory so that it can operate on it," Vacek said. "With our system, we actually have a high-end memory subsystem that's highly parallel and can do single-word access to memory so we get a much higher effective bandwidth."

Convey said that its highly parallel memory subsystem allows application-specific logic to concurrently access 8,192 individual words in memory, which increases the effective memory bandwidth over cache-based systems.

GraphConstructor is available for customers who currently own Convey's hardware. The software can be run in a standalone fashion or in conjunction with Velvet to get identical results with a lower memory requirement.

Pricing starts at $40,000 for a single node with a "reasonable" amount of memory and runs to around $90,000 for an HC-1ex with a large memory.

Memory Problems

Velvet developer Daniel Zerbino, a researcher at the Center for Biomolecular Science and Engineering at the University of California, Santa Cruz, said in a statement that he and his colleagues didn't address hardware footprint when they developed the algorithm in 2006.

"Memory size is the biggest difficulty," he said. If your machine doesn't have enough memory, you must break down the problem and that can be quite a constraint." He added that Convey's new software should help researchers "who want to test more parameters to achieve better assemblies or look at bigger jobs such as metagenomic or mammalian genome samples."

In tests run at the Department of Energy's Joint Genome Institute, investigators compared GraphConstructor on the HC-1 and Velvet on a standard architecture using microbial genome datasets sequenced on an Illumina platform. The results of the comparison were presented in a poster at JGI's user meeting in March.

In the study, GraphConstructor was run on the HC-1 system, which has a Xeon L5408 host server, 128 GB of RAM, and a coprocessor comprised of 4 Xilinix V5LX330 FPGAs. Velvet was run on a Sunfire x4640 with 2.6 GHz Opteron 8435 processors and 512 GB of RAM.

When both programs were run on six microbial genomes and one fungal genome ranging in size from 2.0 to 33.5 megabases, GraphConstructor achieved a two-fold speed-up over Velvet, Convey said.

Using datasets from the cow rumen metagenome ranging between 10 Gb and 160 Gb, GraphConstructor achieved a speedup of between 2.2-fold and 2.8-fold and reduced memory usage ranging from 29 to 82 percent of that required by Velvet for the same datasets.

Prior to using GraphConstructor, it took Velvet five days to assemble a dataset with 1.9 billion reads and 160 Gb on a 32-core system with 1 terabyte of RAM, Convey said. With GraphConstructor it took 1.5 days, to assemble the same dataset.

Convey also reported that researchers at University of Mainz assembled 300 million reads from the Riesling grape using a Convey system running GraphConstructor, while an existing computer system running Velvet and SOAPdenovo didn’t have enough memory to complete the same assembly.

An Important Market

Although it doesn’t exclusively focus on life sciences — it also targets the oil and gas and financial industries — Convey identified the sector as one of the first markets for its HC-1 system when it began shipping the product in late 2009. Among its first customers was the University of California, San Diego, which received a beta version of the HC-1 with the goal of accelerating the performance of its InsPecT/MS-Alignment proteomics software package (BI 11/20/2009). The company also counts the Virginia Bioinformatics Institute among its customers.

The company released HC-1ex last November. The system includes upgrades to the host server and coprocessor as well as two to three times the number of usable logic gates, which leads to a two- to three-fold faster run time for applications such as Smith-Waterman algorithm, Vacek said.

Vacek said that some life sciences groups have purchased the HC-1ex system but could not provide specific information about customers.

Convey faces competition from other firms who are turning to FPGAs to speed up bioinformatics algorithms. Earlier this year, DRC Computer announced that an implementation of Smith-Waterman on its Accelium FPGA coprocessors was the first version of the sequence-alignment algorithm to achieve several trillion cell updates per second (BI 2/4/2011).

In addition, Pervasive Software last year reported that its implementation of the Smith-Waterman algorithm, based on its Pervasive DataRush parallel processing platform, surpassed other implementations of the algorithm on a standard CPU architecture by 43 percent (BI 10/1/2010).

Although Pervasive "did a fine job" of implementing Smith-Waterman, it's "a very expensive solution," Bruce Toal, Convey's president and CEO, said. "We want to deliver cheaper platforms that are faster. Hence a 2U server in a rack can deliver the performance of these very large multi-rack systems like an SGI Altix."

The reduced cost, he said, is due in part to Convey's decision to build its platform on the x86 architecture, which is "the most widely used ecosystem for computation in the world and … allows us to start with commodity components that contribute to lower cost of ownership."

Vacek noted that Convey's offerings have some important features that distinguish them from their competitors.

For starters, "we offer a true coprocessor ... [that] goes right into the socket and is tightly integrated with the host processor," he said. "Any other FPGA or GPU accelerated solution are all input/output solutions ... [or] boards that go into a PCI slot and don't have a lot of logic space on them."

These systems also have less memory, he said, and suffer from bandwidth limitations when computation is sped up, while for Convey, "the amount of space we have to accelerate an application is much higher."

Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.