NEW YORK (GenomeWeb) – Researchers from the University of California, San Francisco recently received a grant from the National Institutes of Health to add new features to existing software used to simulate genetic changes in populations over time and enable it to run on new infrastructure that will help reduce computational run times.
This is the first grant that Ryan Hernandez, an assistant professor in UCSF's department of bioengineering and therapeutic sciences, has received to support the development of the Simulate Finite Sites Under Complex Demographic Effects (SFS_CODE) software, a forward simulation software for modeling the impacts of selection and demographic history of human populations that he developed and first published in 2008. The five-year project, funded by the National Human Genome Research Institute, will receive more than $390,000 – including direct and indirect costs — per year for the duration of the grant.
According to the grant abstract, the researchers will use the funds to design a new graphical user interface for SFS_CODE and they'll also use the tool to compare and contrast various existing statistical tests. Other plans for the grant include developing new features such as the ability to "accommodate complex evolutionary models," and also enabling the simulator to run on "heterogeneous computing architecture" including central processing units and graphical processing units (GPUs).
SFS_CODE, Hernandez told BioInform this week, provides a framework for trying to understand disease that works by simulating how generations of given populations evolve over time. In building models, the software takes into account information gleaned over the years about human demographic history and natural selection. It lets users "incorporate patterns of selection within these simulations … and embed within that a phenotype model which will basically say these individuals are more prone to disease because they have a set of mutations that convey some sort of risk," he explained. After the simulation is completed, "[we can] then ask what is the prevalence of the disease, and then under these sort of assumptions what does that imply about the genetic architecture of the disease."
Hernandez researches patterns of genetic variation in populations, exploring specifically the contribution of demographic history and natural selection. The underlying idea is that "human populations … expanded quite dramatically within the last several thousand years, and prior to that there has been a lot of complex mixing and splitting of populations that resulted in what we have today in terms of complex population demography," he explained. "We know that natural selection has also played a significant role in shaping pattern genetic variation particularly with regards to negative selection or deleterious alleles arising and being removed from the population."
Researchers, he said, exploring the basis for complex disease studies, such as autism, have looked at the possible role of both common and rare variants, asking questions such as, "How can you expect a larger effect size for rare variants compared to common variants? And if there are large effects at low frequency, then what's keeping them there?" SFS_CODE provides a tool for trying to generate models that could possibly explain observations from inherited disease studies, he said.
With the help of this grant, Hernandez and his team are now working on expanding their software to simultaneously model phenotypes and genotypes. This will allow users to explore the ways that "assumptions about the genetic architecture of a phenotype impact [their] observations." In terms of the planned changes to the GUI, Hernandez intends to simplify the interface such that users can essentially point and click their way through to setting up and running simulations. They'll be able to, for example, set up and run a simulation that "model patterns of linkage disequilibrium and functional elements" in specific regions of the genome by simply entering in their region of interest and selecting the demographic model they want to use, he said.
Also, once the system has been optimized to compute on GPUs — also work planned under this grant — "then it will be possible to run a large set of simulations just on a desktop," he said, and it will also reduce the computation time required to run simulations. As it currently stands, users need to have a cluster if they want to run several simulations in a timely fashion, but Hernandez's lab is working on updating the code to run in OpenCL — it's currently written in C — an open-source programming language that runs on different kinds of GPU architecture.
Also, when they have SFS_CODE running more efficiently, Hernandez plans to add new statistical tools that will enable the software do better inference of both demography and selection within human genomes, he said. Specifically, they'll use a statistical method called approximate Bayesian computation.
"When you have a theoretical model you want to fit, usually you say, 'Here is my data, and I am going to calculate the probability of this data under the model' … and then I can optimize the parameters using some sort of optimization technique," he explained. The problem is that "once you get outside of simple models, there is no way to really write down the probabilities." As an alternative, "you can just do simulations," he said. "You can basically say, 'Here are the parameters that I chose,' do a bunch of simulations and say, 'How well do they match my observed data?'"