The Digital Science Center at Indiana University's Pervasive Technology Institute has been awarded a three-year, $1.4 million grant from the National Science Foundation to build a national center that will provide genome assembly software, data analysis tools, and consulting services to NSF-funded researchers.
The new center, dubbed the National Center for Genome Analysis Support, or NCGAS, will provide a core of experts, software, and hardware to meet what IU describes as a perceived need for these resources among researchers involved in genome assembly and analysis.
IU will begin allocating time on NCGAS' resources in January 2012, William Barnett, senior manager for life sciences for IU's research technologies group and a co-principal investigator on the grant, told BioInform.
NCGAS tools and services will support de novo genome assembly-based tasks as well as metagenomic and resequencing projects. The center will also provide consulting services for scientists who are granted time on its infrastructure.
Specifically, NCGAS will offer access to cluster-based genome analysis and assembly software, including SOAPdenovo, Velvet, ABySS, the Celera assembler, Newbler, Allpaths, and Arachne 2. It will also provide some storage for submitted data sets and act as a repository for open source analysis software.
Barnett said that any active NSF-funded project that's involved in genomics-based biology research would have access to NCGAS resources.
Investigators can apply for access through an online allocations process, overseen by a panel of scientists. Once the projects are approved, NCGAS compute resources will be available via a remote login and web-based workflow tools.
Barnett said that the planned combination of expertise, hardware, and software will address the widening "gap" between the quantities of genomic sequence data generated and the ability for non-expert scientists to computationally process that information.
"Even though there are [other] resources and facilities" at various institutions, "there is no resource that solves this particular problem," he said.
Separately, NCGAS aims to implement a public/private service partnership for genome analysis on a fee-for-service basis that will serve as a lower-priced alternative to commercial cloud-based services.
The details of that arrangement are still under wraps; Barnett said, declining to provide further comment.
Whatever its cloud computing plans are, IU does have some prior experience in the space.
Nearly two years ago, it received $1.5 million from the National Institutes of Health to apply cloud computing to life science research in an attempt to dodge computational bottlenecks such as long computation times and large memory requirements (BI 12/3/2009).
Prepping to be Good Hosts
To make itself an attractive host for the NSF's new research resource, IU invested in new hardware and has developed "high-speed data management systems," Barnett said.
In addition to its own infrastructure, the university is teaming up with the Texas Advanced Computing Center and the San Diego Supercomputing Center — who are also collaborators on the Open Grid Computing Environments project — to provide hardware for researchers.
TACC's Gordon system and SDSC's Dash will provide "computational resources on the backend" that will support NCGAS, Barnett said, while IU will provide both consulting services and software support on its recently purchased large-memory computer cluster called Mason.
Manufactured by Hewlett Packard, Mason has 16 nodes containing 32 cores each, with 512 gigabytes of random access memory per core.
Barnett told BioInform that when the center is fully functional, 12 of Mason's nodes will be dedicated to NCGAS' genome analysis activities while the other four nodes will be reserved for IU's use.
Separately, IU has developed "wide area network distributed file system technologies" that allow "high speed parallel file systems to be mounted at multiple locations simultaneously," making it possible to "bring data to Mason and systems like it," he explained.
This reduces the need to move data around various locations, he continued. "We can just set up wide area network links for these high speed data systems and they can be mounted at the different systems," he said.
IU isn't planning to buy additional hardware for the center at present, though it will continue to "assess the growing needs for computational resources for genomics analysis," Barnett said. Whether the center expands will depend at least partially on its successful adoption by the community, he added.
"We certainly see a lot of potential there ... the needs are there and are growing really rapidly," he said. "And there just isn't the computational capacity to meet these needs and so we see this as an area of growth."
At present, IU's research technologies group has 60 full-time staff, many of whom will be tasked to support the center, Barnett said.
He said the university is currently accepting applications for a center manager.
An ideal candidate will have a background in both biology and computer science because they will serve as the "primary brace" between biologists and the technologies they need for their research, he said.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.