NEW YORK – A team led by researchers at the University of Bristol have completed a large-scale analysis looking at the relationship between plasma proteins and disease phenotypes.
Detailed in a paper published this week in Nature Genetics, the work used two-sample Mendelian randomization and colocalization analysis to analyze the impact of 1,002 plasma proteins on 225 diseases, identifying 111 potential instances where changes in protein expression appeared to impact disease.
The findings indicate that such an approach could prove useful for drug development work, providing pharma researchers with a tool for identifying and prioritizing potential targets, said Jie Zheng, a senior research associate at the University of Bristol and first author on the study.
Mendelian randomization uses the random distribution of genetic variants within a population to explore the links between these variants and phenotypes of interest.
Similar to a traditional drug trial, in which you have a case group and a placebo group, with Mendelian randomization researchers are able to divide people into two groups, one with a particular genetic variant and the other without it, Zheng said.
In the case of the Nature Genetics study, the group looked at the changes in protein expression caused by these genetic variants and whether those changes were associated with changes in disease phenotypes of interest. This allowed them to identify proteins that could potentially be targeted with therapies to these diseases.
"You have one group that is exposed to a higher protein level and another group that is exposed to a lower protein level," Zheng said. "And because proteins are the targets for most drugs, we can use this Mendelian randomization pipeline to mimic a very similar structure as a randomized controlled trial."
While proteins are key drug targets, the technical and throughput challenges involved in large-scale proteomic analyses means that the availability of high-quality proteomic data has lagged behind the availability of genomic data, which has limited researchers' ability to integrate the two.
In recent years, however, a number of research teams have used Somalogic's SomaScan platform to generate expression data on thousands of plasma proteins in large patient cohorts. One of the most notable was a 2018 study led by researchers at the University of Cambridge and Merck Research Laboratories, which quantified plasma levels of 3,622 proteins in blood samples from 3,301 healthy donors and searched for associations between those proteins and 10.6 million autosomal SNPs.
The Bristol team compiled the plasma proteome data and genetic variant information from this study and four others. Using that data, they then applied two-sample Mendelian randomization, "where we take the association between genetic variants and proteins as a sort of genetic predictor of plasma protein levels and take that into other studies that have disease outcomes," said Tom Gaunt, professor of health and biomedical informatics at Bristol.
Zheng said that the combination of improvements in technology for making high-throughput measurements of thousands of plasma proteins combined with efforts to compile data from large numbers of genome-wide association studies has allowed for analyses like that presented in the Nature Genetics study.
"It's both [the genomic and proteomic] sides that have seen recent developments that have let us start working on large-scale analyses like this," he said, noting that most previous efforts have targeted their analyses to a smaller number of disease phenotypes of interest.
The Bristol team, by contrast, looked at 225 different disease phenotypes, identifying potentially causal relationships between 65 proteins and 52 diseases.
The researchers also looked at historical drug development data to assess how such analyses could improve target identification and prioritization. Zheng said that based on their initial findings, it appears that targets backed by Mendelian randomization information were roughly five times as likely to be successful as those that were not, though he noted that additional data was needed to confirm that result.
He said that pharma companies have begun using these approaches in their target prioritization pipelines and added that he and his colleagues are collaborating with a number of pharma firms in their work, though he declined to provide their names.
"Mendelian randomization has three major advantages," Zheng said. "The first is that it can predict future drug trial success. The second is that it can identify potential side effects for existing drugs. Also, it is possible to use it to repurpose existing drugs to new indications."
Gaunt noted that another advantage is that the method is largely using data that is already being generated as part of other GWAS and plasma proteomic studies.
"It really doesn't cost a lot in comparison to experiment studies," he said. "It's a low cost piece of evidence to assist with prioritization of targets."
Zheng, Gaunt, and their colleagues are part of an effort at Bristol putting together a resource called the OpenGWAS database that aims to compile GWAS datasets from labs around the world to enable work like their Nature Genetics study as well as other kinds of analyses. The database currently contains around 34,000 GWAS datasets.
The researchers have also developed an open-access graphical database called EpigraphDB that aims to integrate multiple levels of omics and health data, Zheng said, adding that the plasma proteomic work is part of this effort.
Gaunt said that while the researchers collaborate regularly with pharma firms they have not formed any kind of commercial entity around this work and make all their data openly available.
He said that they might in the future launch a commercial venture, but any such effort would likely be focused on more in-depth work on specific targets as opposed to the broad analysis presented in their recent study.
"This kind of broad analysis of lots of targets is something that we want to make as available as possible, but we may work with people in more detailed follow-up analysis," he said.