NEW YORK (GenomeWeb) – Researchers from the University of Illinois at Urbana-Champaign and Stanford University have received at $1.3 million grant from the National Institutes of Health to develop novel data compression strategies that will help address challenges associated with data storage, transfer, and visualization.
Specifically, the partners plan to develop a suite of data compression software that will be able to handle several types of genomic data including DNA sequence, metagenomics data, quality scores for sequences, and data from gene function analysis.
As part of the project, the researchers will explore strategies that combine existing compression algorithms with novel algorithms that they create. They will also look at trade-offs between how much the size of a dataset can be reduced and the computing power necessary to achieve compression, and develop parallel processing strategies to decrease wait times for users of the resulting software. While each data type requires a unique compression approach, the researchers hope to identify methodologies that can be transferred across various types of data.
"We will cover the development of the algorithms, their analysis, prototyping of the software solutions, and benchmarking on real data," Olgica Milenkovic, an associate professor of electrical and computer engineering at the University of Illinois and one of the PIs on the project, said in a statement. She added that the group plans to collaborate with Mayo Clinic and potentially other institutions to promote use of the methods among biomedical researchers.
The grant is one of several new software development awards under the umbrella of the NIH Big Data to Knowledge (BD2K) Initiative, which supports efforts to improve the production, analysis, management, and accessibility of large biomedical datasets of all kinds.