Skip to main content
Premium Trial:

Request an Annual Quote

Rice U Team to Use $1.1M NSF Grant for Cloud-compatible Bayesian Tools for Evolutionary Studies


NEW YORK (GenomeWeb) – Two research groups from the computer science department at Rice University will use a three-year, $1.1 million grant from the National Science Foundation to develop cloud-based statistical software for analyzing evolutionary patterns.

Specifically, Christopher Jermaine and Luay Nakhleh, who are both associate professors of computer science at Rice, will use the NSF funds to create open-source cloud software that uses Bayesian inference techniques to track how genes and genomes evolve across species, and to make the software broadly available to the research community.

In practice, being able to run analyses in parallel and to access thousands of computers quickly in the cloud will help shorten the time to results significantly, according to the developers. "We're talking about potentially taking a years- or decades-long computation and making it feasible by changing the underlying algorithm and making it amenable to distributed computing," Jermaine said in a statement. Moreover, it would provide a potentially cost effective alternative to purchasing and running large local clusters, they said. It could even appeal, they believe, to researchers who have mainframes in house because of the potential for parallelized analysis.

An otherwise powerful technique for estimating evolutionary history in phylogenetics studies, Bayesian inference is computationally impractical for large datasets, according to Nakhleh. "Analyzing data sets with 10 or 20 gene sequences can easily take hundreds of hours," he said in statement. "But the tree of life has millions of sequences and is built from millions of species. There's no way traditional Bayesian techniques are even going to get close to handling that." It's currently infeasible, for example, to use these solutions to build trees composed of thousands of taxa or species, Nakhleh told BioInform.

Parallel and distributed computer infrastructure offer a solution to the intensive computation needs of phylogenetics researchers; however; very little research has explored the potential of this kind of infrastructure for these kinds of studies, Jermaine said. "There's a reasonably large amount of work on cloud-based Bayesian learning, but it's almost all for data analytics, not for biological applications," he told BioInform. For example, he and his colleagues have developed a system that lets users "write and execute codes for large-scale Bayesian models," he explained, adding, however, that on the whole "there are not many papers describing cloud-based phylogenetics tools, and I think it's safe to say that [nothing] has been targeted to Bayesian phylogenetics in particular."

The NSF grant will enable the Rice researchers to expand existing Bayesian methods and make them more amenable to parallel and distributed computing systems like the cloud. Over the next three years, they'll work on mathematical modeling and algorithm development, implementing and running the software on distributed systems, refining it to remove bottlenecks, and finally publishing the software.

"We want to deliver something that’s very easy to use," Jermaine said, so "that somebody can just boot up a machine instance on Amazon [for example]" and then with "a couple of key strokes, fire up a cluster under that machine's control and then run whatever they want to run."

Filed under

The Scan

Genome Sequences Reveal Range Mutations in Induced Pluripotent Stem Cells

Researchers in Nature Genetics detect somatic mutation variation across iPSCs generated from blood or skin fibroblast cell sources, along with selection for BCOR gene mutations.

Researchers Reprogram Plant Roots With Synthetic Genetic Circuit Strategy

Root gene expression was altered with the help of genetic circuits built around a series of synthetic transcriptional regulators in the Nicotiana benthamiana plant in a Science paper.

Infectious Disease Tracking Study Compares Genome Sequencing Approaches

Researchers in BMC Genomics see advantages for capture-based Illumina sequencing and amplicon-based sequencing on the Nanopore instrument, depending on the situation or samples available.

LINE-1 Linked to Premature Aging Conditions

Researchers report in Science Translational Medicine that the accumulation of LINE-1 RNA contributes to premature aging conditions and that symptoms can be improved by targeting them.