NEW YORK (GenomeWeb) – Armed with roughly $3.2 million in funding from the National Cancer Institute, researchers from the University of Virginia and elsewhere are investigating the links between genetic variation and tissue expression in the context of breast and ovarian cancers.
Over the next five years, the researchers will use the funds to study the "pleiotropic mechanisms underlying susceptibility to ovarian and breast cancer," according to the grant abstract. They will do this by performing tissue-specific transcription-wide analysis of gene expression data associated with common variant risk alleles for the diseases identified by genome-wide association studies.
The goal is to identify mechanisms that aid in the discovery of genetic susceptibility genes and to identify the functions of these genes, said Joellen Schildkraut, a professor in the department of public health sciences at University of Virginia and one of the principal investigators on the grant.
"The novelty of the study is the search of common genes affecting risk of two cancer subtypes through a transcriptome-wide approach and the validation of the gene associations using molecular techniques," she said. "It is the integration between the genome and the transcriptome." Ultimately, the goal would be to be able to identify genetic mechanisms that result in improved survival or that prevent tumor growth, she added.
The project has two key components, according to Schildkraut. The first is to use a computational tool called PrediXcan to predict gene expression based on variant information in breast and ovarian cancer cases. The second would be to perform functional analyses to figure out how gene expression affects tumor development. By integrating germline genetic data with whole-genome transcription data, the researchers will be able to reduce the multiple testing burden posed by genome-wide association studies by grouping multiple risk loci at the gene level, and will be able to simplify the process of characterizing implicated pathways, according to the grant abstract.
For the study, the researchers will rely on a lot of publicly available genomic data including RNA-sequencing data from both tumor and normal tissue, Schildkraut said. Specifically, they will draw on genotype and phenotype data from sources such as the Cancer Genome Atlas and the Genotype-Tissue Expression project. They will also use datasets from the Ovarian Cancer Association Consortium, which focuses on identifying genes that are related to the risk of developing ovarian cancer, and the Breast Cancer Association Consortium, an international consortium of 84 epidemiological and clinical breast cancer studies from over 100,000 breast cancer patients and controls. They will also incorporate methylation, GWAS, and SNP data.
To model the link between variants and gene expression, the researchers will use PrediXcan, a gene-based association method that prioritizes genes that are likely to be causal for given phenotypes. PrediXcan was developed by researchers at the University of Chicago and elsewhere to elucidate the biological mechanisms that underlie the associations between genetic variants and traits in genome-wide association studies. The model can integrate multiple sources of genomic and transcriptomic data to identify the role of genetically regulated gene expression traits in the pathogenesis of breast and ovarian cancers, for example.
The software works by categorizing expression information into distinct groups then using prediction models to estimate the fraction of the observed expression that results from genetic regulation. Researchers can then use regression or non-parametric tools to correlate the genetically regulated expression component with the phenotype of interest. At the time of its release, PrediXcan's developers noted that it was the first approach to account for gene regulation mechanisms when making predictions about gene effects.
Since a paper describing PrediXcan was published in 2015, its developers have made several extensions to the software. "Given the need for scalable integrative methods to understand the mechanisms that link genetic variants with phenotypes, we developed S-PrediXcan, which robustly infers PrediXcan results using summary results instead of individual level data," Hae Kyung Im, an assistant professor of genetic medicine at the University of Chicago and lead developer of PrediXcan, said in an email. "We have applied this method to a broad set of phenotypes creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes." Im's group has also developed PredictDB which houses genetic prediction models of transcriptome levels for use with PrediXcan; and MetaXcan, a separate set of tools for performing integrative gene mapping studies.
Furthermore, "from analyzing these broad set[s] of phenotypes we find that there is ubiquitous cross-tissue sharing as well as context specificity of gene expression regulation," Im said. "This suggests that aggregating information across tissues as well as zooming into highly context-specific samples will be needed to further elucidate the pathogenic mechanisms."
Schildkraut said that she became interested in using the method after PrediXcan's developers described their method and its application to data from diabetes and Crohn's disease cases during a seminar at her institution. She noted that the tool showed how tissue "at a very specific target site or even tissues that may not be at the target site" may provide information to identify genes that may be involved in risk of a given disease, she said. In partnership with her co-principal investigator, Simon Gayther, director of molecular epidemiology at Cedars-Sinai, "we decided to look at high-grade serous ovarian cancer and triple negative breast cancer."
For the current study, Im and her team will be developing statistical methods to accomplish the goals of the study, she said. "We will be using all existing transcriptome data such as GTEx, TCGA, and the highly context-specific tissues from the Gayther lab." The results of the study will be published in peer-reviewed journals and the researchers will make the data available in repositories such as dbGAP, Schildkraut said.