When Indiana University installs the fastest university-owned supercomputer in the US next spring, it will not only boost IU's compute capacity by 25-fold, it will also increase "the amount of science that we're going to be able to get done" by a similar factor, says Bill Barnett, director of the Science Community Tools group at Indiana.
Barnett, who also leads IU's National Center for Genome Analysis Support, is working to ensure that the petaflop-scale Big Red II supercomputer will support a range of bioinformatics tasks.
The university plans to install the Cray system next spring, with broad availability targeted for May 2013. Big Red II will replace the university's current Big Red system, which was installed in 2006.
Big Red II will dwarf its predecessor, with more than 21,000 processor cores compared to Big Red's 4,100. In another departure from the first Big Red, the new system will have a hybrid architecture, comprising x86 processors as well as GPU accelerators.
This hybrid structure will come in handy for at least one ongoing project at Indiana, Barnett says. Andrew Saykin, director of the IU Center for Neuroimaging, is working to integrate brain imaging with genomics to study neurological and psychiatric disorders such as Alzheimer's disease and schizophrenia.
Barnett's team is working with Saykin to construct analysis workflows for combined imaging and genomics datasets. Image processing "works really well in a GPU architecture," while most genomics analysis is better suited to an x86 architecture, "so we're anticipating being able to construct workflows that will run on the same platform that will be able to do these integrated analyses of imaging and genomics," Barnett says.
Indiana researchers also plan to exploit Big Red II's Gemini interconnect, which is expected to be three times faster than Big Red's.
In particular, Thomas Sterling and Andrew Lumsdaine of IU's Center for Research in Extreme Scale Technologies plan to develop faster graph-based processing for applications like de novo genome assemblers that rely on de Bruijn graphs.
"We could potentially make applications like ABySS or other kinds of parallelized assemblers work much more rapidly than before," Barnett says. Many de novo assemblers "are basically serial applications, so they depend upon a large amount of memory," he says. "If we can parallelize that and run it rapidly across a bunch of nodes that are interconnected ... we'll be able to potentially use different kinds of architectures and speed up the ability to do assembly."
In addition to these research projects, Barnett anticipates making Big Red II's capabilities available to the broader biological research community. Currently, most genomics computing jobs at Indiana are run on Mason, a 16-node large memory Hewlett Packard system with 512 gigabytes of RAM per node, so the new machine will dramatically increase the university's life science computing capacity.
Barnett says that Big Red II will complement the Mason system. The two computers will share a file system so that researchers can easily move jobs between the two machines.
Mason's large memory architecture is well suited for assembly, for example, while Big Red II's highly parallel architecture makes it better for applications like Blast. "Say you get sequences back from a sequencing center," Barnett says. "You'll be able to put them into our file system, assemble them on Mason, and then move to doing Blast in a much more efficient fashion on a system like Big Red II, which is architected for those kinds of parallel applications."