NEW YORK – A pair of studies from two independent research teams showcases how scientists are using data from the UK Biobank (UKB) Pharma Proteomics Project (PPP) to identify biomarkers linked to disease and other biological processes.
In one study, published in Nature Medicine in July, a team led by researchers at the University of Cambridge used the UKB-PPP data to identify protein profiles predictive of more than 50 diseases, while in the other, published in the same journal earlier this month, a team led by scientists at the University of Oxford used the data to develop a proteomic "clock" for assessing individuals' biological age and general health.
Both studies made use of the most recent data release from the UKB-PPP, which became available to researchers outside the project — a consortium of 13 biopharma companies — in the fall of 2023. The dataset consists of measurements of roughly 3,000 proteins in blood samples from 54,000 UKB participants using Olink's Explore platform.
The UKB-PPP "is a pretty incredible dataset," said Austin Argentieri, a research fellow at Harvard Medical School, affiliate member at the Broad Institute, and first author on the proteomic clock study. He noted that access to such a large proteomic database linked to comprehensive clinical and genomic data was key to his team's work.
In their study, Argentieri and colleagues used the proteomic data from 31,808 individuals in the UKB cohort to train models for predicting their chronological ages. They generated their models using six different machine learning methods then tested them in a separate 13,633-individual cohort from the UKB as well as two external cohorts — 3,988 subjects from the Chinese Kadoorie Biobank (CKB) and 1,990 subjects from the FinnGen biobank. Ultimately, they arrived at a model using 204 plasma proteins and a gradient boosting-based machine learning approach that correlated strongly with chronological age in all three cohorts. They then pared this down to a 20-protein model that retained 95 percent of the predictive power of the 204-protein model.
Using their models, the researchers calculated what they termed the ProtAgeGap for the subjects in the three cohorts — the difference in age as predicted by the proteomic model, or ProtAge, and their actual age. In the UKB cohort, the ProtAgeGap spanned 12.3 years, with the top 5 percent of individuals having a ProtAge 6.3 older than their actual age and the bottom 5 percent a ProtAge six years younger.
Comparing ProtAgeGap with existing measures of physical health and biological aging, the researchers found that an individual's ProtAgeGap score was significantly associated with a broad range of such measures. They also found the score was predictive of a person's risk of all-cause mortality as well as common diseases including osteoarthritis, type 2 diabetes, chronic kidney disease, and a number of cancers. The ProtAgeGap score was also associated with functional traits like walking speed, handgrip strength, and cognitive function.
A number of groups have previously developed models of biological aging based on DNA methylation, but many of these clocks are "only weakly associated with mortality risk and aging-related function," the authors wrote. Argentieri said he and his colleagues believe proteomics-based approaches could yield stronger functional associations as well as models that are more generalizable across diverse cohorts. Several other research teams have developed proteomic aging clocks, including a 2023 effort led by researchers at the Biomedical Primate Research Centre in Rijswijk, Netherlands, that used data on more than 37,000 individuals generated using SomaLogic's (now Standard BioTools') SomaScan platform.
Argentieri said it remains an open question how exactly proteomic aging clocks might be put to use but that a potential application is as a preventative medicine tool for gauging individuals' general health and future disease risk.
"We envision this as something you can do early and often," he said. "Test when you are young. See what trajectory you are on … and then if you and your physician don't like the picture you see, you can start to course-correct."
Argentieri and his colleagues are currently evaluating their model within several clinical trials looking at interventions like adjustments in diet and physical activity to see if it can pick up changes in patient health produced by those interventions.
If improvements in patient heath are reflected in the clock measurements, "that will give us some confirmation that this is a biomarker that will tell you about how well what you are doing is working," he said.
The researchers have filed for patents in the US and UK based on the results of their Nature Medicine study, Argentieri said, though he added that they are still "in the early days" of developing the tool. He and his colleagues are currently in discussions with several proteomics companies to develop an assay panel targeting the 20 proteins used in their model.
The Cambridge effort likewise used the UKB-PPP data, in their case to predict individuals' disease risk across a wide range of conditions, though not within the context of a proteomic clock. Specifically, the researchers aimed to develop proteomic risk models for the 218 diseases for which there were more than 80 cases represented among the UKB cohort.
Using training sets consisting of 70 percent to 75 percent of a 41,931-subject subset of the UKB-PPP cohort and validation sets consisting of 25 percent to 30 percent of the same subset, the researchers identified panels of between five and 20 proteins that, in the case of 67 diseases, improved prediction of a patient's 10-year risk compared to models based on clinical information alone. In the case of 52 of these 67 diseases, the protein panels improved risk prediction compared to clinical information combined with routine clinical blood tests.
Like the Oxford team, the Cambridge researchers see their protein panels being potentially useful as risk assessment tools, said Julia Carrasco-Zanini, first author of the study and a postdoctoral researcher at Queen Mary University of London. Carrasco-Zanini was a graduate student at Cambridge when the study was conducted.
"At the moment, we are thinking of the signatures as screening tools, not diagnostic tools," she said. "For instance, if we have something like idiopathic pulmonary fibrosis, where we see that the predictive signature is very good, we think, OK, if we screen people and have a group we can identify as being at very high risk, perhaps they could be followed more closely with some sort of imaging every few years or so."
Carrasco-Zanini observed that the proteomic risk scores developed in the study outperformed polygenic risk scores (PRS) for 22 of the 23 diseases for which PRS were available in the UKB, the lone exception being breast cancer, which she suggested reflected that disease's large genetic component. She added that she and her colleagues also conducted a more systematic comparison of proteomic signatures and PRS using the EPIC-Norfolk cohort that they published in Lancet Digital Health in July. That study similarly found proteomic risk scores to outperform PRS, with protein models showing better risk predicting for 17 of 23 outcomes investigated.
Moving forward, Carrasco-Zanini said the researchers plan to validate their findings in additional, more diverse cohorts. They also aim to benchmark their panels against existing markers that were not available in the UKB cohort. For instance, she noted, while the team's model for multiple myeloma was "highly predictive," measures for patient M-protein levels, which are commonly used as a test for the condition, were not available in the UKB-PPP data.
Ultimately, Carrasco-Zanini said, they hope to develop clinical-grade assays they could use in clinical research and to test some of their panels in screening trials.
Argentieri and Carrasco-Zanini both said they expect the UKB-PPP will allow researchers to explore a variety of questions around proteomics, as it is the largest such resource made available to date.
Beyond the scale of its proteomic measurements, the biobank's detailed genomic and phenotypic data "really expands the opportunities for addressing multiple questions of proteomic research, which is why we are starting to see this wave of different papers around this topic," Carrasco-Zanini said.
Argentieri suggested that as the UKB-PPP demonstrates its value as a research tool, it will drive other biobanks to add to their proteomic datasets.
He cited the example of the CKB and FinnGen biobanks he and his colleagues used in their research. "They've got proteomics on a few thousand [subjects], and they have aspirations now to have it on tens of thousands or more, because I think everyone is seeing that large-scale human population proteomics just gives you so much power for understanding disease biology, making predictive models, understanding disease progression."