NEW YORK (GenomeWeb) – A team led by researchers at the University of Iceland and the Icelandic Heart Association have completed a large-scale analysis of the human serum proteome.
Published today in Science, the study identified 27 distinct protein network modules, linking them to a variety of health outcomes and demonstrating relationships between DNA sequence variants and the protein network modules. The findings suggest that "coordinated variance of serum proteins may offer unrecognized opportunities for target and biomarker identification in human disease," the authors noted.
That proteins or protein networks could prove useful as disease biomarkers is not a new idea, but few studies have generated datasets with both the depth of coverage and large cohort size of the Iceland effort. Using Somalogic's Somascan technology, which uses a form of aptamers for highly multiplexed protein detection, the researchers measured 4,137 different human proteins in serum taken from 5,457 subjects from the AGES Reykjavik study, which the authors described as "a prospective study of deeply phenotyped and genotyped subjects over 65 years of age."
Using weighted gene-to-gene co-expression analysis, the researchers grouped the proteins based on their apparent co-regulation, the notion being that proteins with functional relationships should exhibit a level of coordination in terms of their variance across subjects or conditions. This analysis identified 27 "co-regulatory modules" consisting of as few as 20 proteins and as many as 921. They tested the modules by dividing the AGES cohort into training and test sets and running statistical tests to determine whether they were preserved.
"Permutation testing of the data indicated that these modules were unlikely to have occurred by chance," they wrote.
Correlating the modules to disease outcomes, they found links between different protein networks and conditions including coronary heart disease, heart failure, type 2 diabetes, and metabolic syndrome. Certain modules were also linked to incident disease, all-cause mortality, and coronary heart disease mortality, indicating, the researchers said, "that the protein network predicted future events and disease outcome."
They also integrated the protein data with data on SNPs identified via genome-wide association studies. Among their findings was that roughly 60 percent of the genetic effects on serum protein expression appears due to either post-transcriptional regulation or some "yet unknown transcriptional effect."
Additionally, they identified a number of examples of how individual SNPs might contribute to the structure of serum protein networks. For instance, they found that "distinct loci at APOE and BCHE were associated with the lipoprotein-enriched [serum protein] module PM11." These loci, they noted, had cis effects on the APOE and BCHE proteins and had trans effects on 89 percent of the proteins (64 total) in the PM11 module.
"These results highlight the genetic architecture of the serum protein network and show that the modules and disease variation are intimately connected," the authors wrote.
The study "underscores the role of protein networks as the sensors and integrators of complex disease," they noted, adding that "the serum proteome may be a rich and accessible setting to mine for biomarkers of disease and disease responses to integrate information from tissues in a global regulatory network."