NEW YORK (GenomeWeb News) — A team of bioinformaticists at the University of Maryland are using technology originally developed for the gaming community to help analyze vast amounts of short-read data from next-generation sequencers.
The approach relies on 3D graphics hardware called graphics processing units, or GPUs, to accelerate a range of applications. The UMD team believes that the technology can find a home in the next-generation sequencing and bioinformatics communities.
“As sequencing technology is getting very cheap with the new methods, we were concerned that the computing resources required to process the data wouldn’t be getting cheaper,” UMD’s Cole Trapnell told GenomeWeb Daily News sister publication BioInform.
“You may be generating all this sequence, but you might need a supercomputer to deal with it, so we liked the idea that graphics cards offered a researcher without a $100,000 IT budget the ability to process the data,” he added.
In a paper published earlier this month in BMC Bioinformatics, Trapnell and UMD colleague Michael Schatz discuss a version of the MUMmer sequence-alignment program that they developed to run on a graphics card from nVidia.
The authors report that the program, called MUMmerGPU, gained speed as read lengths became shorter. For instance, MUMmerGPU was twice as fast as the CPU version for a query length of around 800 base pairs, while a query length of 25 base pairs caused a more-than 10-fold improvement.
Although the acceleration was seen only for very short queries, “these read characteristics are beginning to dominate the marketplace for genome sequencing,” the authors note in the paper. They cite as an example Solexa’s sequencer, which creates around 20 million 50 base-pair reads in a single run. “Thus our application should perform extremely well on workloads commonly found in the near future,” the authors report.
MUMmerGPU, which is available for download here, is well-suited for aligning many reads to a reference genome, which would have great utility for genotyping, for example. Schatz told BioInform that the team next plans to apply the approach to de novo assembly of short-read data.
“De novo assembly is the big target. That’s what’s on everybody’s minds,” he said. “You’re talking about millions and millions and millions of reads being produced by centers that don’t have supercomputers, so GPUs seem like an ideal fit for that.”
The complete version of this article appears in the current issue of BioInform.