NEW YORK (GenomeWeb) – Researchers from the University of Chicago and other institutions have developed a new computational method dubbed PrediXcan that they claim improves on existing methods for detecting genes associated with complex diseases and traits.
The developers believe that their method helps elucidate the biological mechanisms that underlie the associations between genetic variants and traits in genome-wide association studies, thus addressing a gap that other GWAS data-analysis tools, such as single-variant tests and other gene-based approaches, have left unfilled.
While existing methods successfully identify mutations that are associated with complex traits and diseases, an understanding of the biology behind those discoveries is an important prerequisite to translating that knowledge into actionable drug targets or better treatments, Hae Kyung Im, an assistant professor of genetic medicine at the University of Chicago and lead author on the study, told GenomeWeb this week.
Im and her colleagues developed PrediXcan to meet that need. As they explained in a Nature Genetics paper published this week, the freely available tool combines gene expression information with phenotype information to make predictions about which genes likely play a role in heritable diseases or phenotypic traits. Specifically, their method estimates "the component of gene expression [in a sample that is] determined by an individual's genetic profile and [then] correlates [the] 'imputed' gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype," they wrote in the paper.
In other words, "we take the DNA variation data and we predict what levels of expression would be a consequence of that variation data," Im explained. "By doing that we are able to impute expression levels ... of genes for each person and then we associate that with disease data or the complex trait." One of the nice things about the method is that "instead of giving you SNP numbers that don't really tell [you] much..., it gives you genes that we have learned a lot about through animal models and other studies," she added.
Full details of the approach are provided in the paper, but essentially PrediXcan works by first categorizing expression information from an input sample into three groups: expression that results from genetic regulation and is known to be associated with disease; expression that results from a given trait; and expression related to other factors such as environment. It then uses prediction models trained on transcriptome data from projects such as the NIH's Genotype-Tissue Expression (GTEx) project and the Genetic European Variation in Health and Disease study to estimate which fraction of the observed expression is the result of genetic regulation. Im and her group have made these models publically available in a database dubbed PredictDB.
Researchers can then use various regression or non-parametric tools — depending on the phenotype being analyzed — to correlate the genetically regulated expression component with the phenotype of interest.
According to the developers, theirs is the first approach that takes gene regulation mechanisms into account when making predictions about gene effects. PrediXcan not only successfully uses expression data to detect both known and novel causal genes; it's also able to predict the direction of the effect — that is, whether high or low levels of expression might spur the disease or trait that's being observed, which may provide opportunities for the development of targeted therapies, Im and her colleagues said. This ability to predict the directionality of effects also offers opportunities for systems-based analysis of disease development, the researchers said.
PrediXcan also offers up other advantages that are highlighted in Nature Genetics. For example, since no actual transcriptome data is required from the GWAS data contributors, "the method can be applied to any existing datasets with large-scale genome interrogation such as those in the database of Genotypes and Phenotypes or other similar repositories," the researchers wrote.
Moreover, there's ample access to reference transcriptome datasets for prediction model building from groups such as GTEx, which has data from about 40 different tissues andcan support a variety of studies. Also, PrediXcan can help researchers explore relationships between both common and rare variants and phenotypes, although in the case of rare variants researchers would need larger training datasets to generate strong prediction models, the paper states.
The Nature Genetics paper includes results from applying PrediXcan to seven complex disease phenotypes studied under the auspices of the Wellcome Trust Case Control Consortium (WTCCC). Specifically, the researchers used their models to correlate estimated genetically regulated gene expression levels for nearly 8,700 genes with the seven WTCCC phenotypes.
Their analysis of the WTCCC data revealed a number of known gene-disease associations but also identified some new links. As reported in the paper, they found 41 significant associations between genes and five of the selected diseases including 29 genes tied to type 1 diabetes. They also found genes known to be associated with two other autoimmune diseases, Crohn's disease and rheumatoid arthritis. One of the genes that the method found to be linked to both diabetes and RA had not previously been linked to either condition, according to the paper.
Next up for Im and her team is to use PrediXcan in additional studies. Im has just received an RO1 grant from the National Institutes of Health that will support efforts to study associations between genes and mental health disorders, but she hopes that the method will find use in other labs studying other phenotypes, she said, adding that a number of other research groups have shown interest in using the method in their projects.
One of these groups at Stanford University is focused on skin cancer, while another group elsewhere is interested in the method for diabetes studies. Other efforts will focus on improving the prediction models in PredictDB and also continued development of an extension to PrediXcan — called MetaXcan — that will make it possible to use summary statistics instead of individual-level gene expression data to make these same sorts of predictions about gene-phenotype associations, Im said. That should make it a lot easier for researchers to explore and analyze much larger datasets.