Skip to main content
Premium Trial:

Request an Annual Quote

Big Data Program Funds Genome Data Analysis Project

NEW YORK (GenomeWeb News) – The National Science Foundation and the National Institutes of Health have awarded a $2 million grant under the Obama Administration's multi-agency 'Big Data' program to four universities to fund development of new technologies for analyzing genomic data.

Iowa State University, Stanford University, Virginia Tech, and the University of Michigan will use the grant to develop high-performance computing techniques for use on massively parallel systems to analyze high-throughput DNA sequencing data.

Led by Iowa State, which received $1.3 million of the total award, the partners will develop parallel algorithms and high performance implementations for analyzing large sets of genomic data. They will house these tools in software libraries that life sciences researchers will be able to access, and they will design a domain-specific language that will automatically create computing codes to guide the researchers using these tools.

The investigators developing the tools include Iowa State Professors Srinivas Aluru and Patrick Schnable, Stanford Professor Oyekunle Olukotun and Virginia Tech Associate Professor Wu Feng.

The grant was one of eight new awards totaling $15 million announced today by NSF that are being funded under the Core Technologies for Advancing Big Data Science and Engineering program, called Big Data, which launched in March.

Big Data's aim is to fuel development of new tools for managing, sorting, storing, and analyzing massive sets of data that are generated by new technologies, such as next-generation genome sequencing, in a range of fields, including physics, economics, psychology, and medicine.

"To get the most value from the massive biological data sets we are now able to collect, we need better ways of managing and analyzing the information they contain," NIH Director Francis Collins said in a statement today.

"The new awards that NIH is funding will help address these technological challenges—and ultimately help accelerate research to improve health—by developing methods for extracting important, biomedically relevant information from large amounts of complex data," Collins said.

"Seven years ago we were able to sequence DNA one fragment at a time," Alaru said in a statement. "Now researchers can read up to 6 billion DNA sequences in one experiment."

According to Iowa State, the goal of this project is to "empower the broader community to benefit from clever parallel algorithms, highly tuned implementations and specialized high performance computing hardware, without requiring expertise in any of these."

"We're hoping this approach can be the most cost-effective and fastest way to gain adoption in the research community," Aluru said. "We want to get everybody up to speed using high performance computing."

Alaru's previous research has involved plant genome assembly, comparative genomics, deep sequencing data analysis, and development of parallel bioinformatics methods and tools.

Feng's focus has been on the "synergistic intersection of life sciences and high-performance computing," and he has developed a tool for conducting a "massive sequence search" to identify the missing genes in genomes and created a framework for speeding up genome analysis, Virginia Tech said.