NEW YORK – Researchers at Decode Genetics (a subsidiary of Amgen), the University of Iceland, and Reykjavik University have conducted a population-scale study of plasma proteins, genomics, and transcriptomics data to create a new resource that could be used to further elucidate disease pathogenesis.
In a study published in Nature Genetics on Thursday, the researchers described genome-wide association studies of plasma protein levels measured with nearly 5,000 aptamers in more than 35,000 Icelanders. They found more than 18,000 associations between sequence variants and plasma protein levels, 19 percent of them with rare variants. Overall, 93 percent of these associations are novel. They also tested plasma protein levels for associations with 373 diseases and other traits and identified 257,490 such links.
Decode researchers conducted a similar proteomics study in June, in which they used proteomic measurements to develop predictors for short- and long-term risk of all-cause mortality. The predictors were able to identify a high-risk group of subjects between 60 and 80 years old, 88 percent of whom died within 10 years, as well as a low-risk group in which 1 percent died within 10 years.
For their new paper, they integrated protein quantitative trait loci, or pQTL, and genetic associations with diseases and other traits and found that 12 percent of 45,334 lead associations in the GWAS Catalog are with variants that are in high linkage disequilibrium with pQTL. They further identified 938 genes encoding potential drug targets with variants that influence levels of possible biomarkers.
"Proteomics can assist in solving one of the major challenges in genetic studies: to determine what gene is responsible for the effect of a sequence variant on a disease," Decode CEO Kari Stefansson, co-senior author on the paper, said in a statement. "In addition, the proteome provides some measure of time because levels of proteins in blood rise and they fall as a function of time to and from events."
The authors noted that the data can also be used to assist with drug discovery and development.
The researchers measured plasma protein levels with SomaLogic's SomaScan multiplex aptamer assay in 35,559 Icelanders with genotype and phenotype information and analyzed 4,907 aptamers that measured 4,719 proteins. Accounting for multiple testing, the levels of 63 percent of the proteins correlated positively and 18 percent correlated negatively with age, they found. Further, levels of 33 percent of the proteins were higher in men and 23 percent were higher in women.
The investigators then tested 27.2 million variants in the genome for associations with plasma protein levels in their Icelander cohort. They found 18,084 primary, or sentinel, pQTL associations, each representing the most significant association with levels of a protein in a region. Of the 18,084 sentinel pQTL associations, 21 percent had associations with secondary variants in the same region as the sentinel variant and were associated with the same protein based on conditional analysis. Combined, the team found a total of 28,191 sentinel and secondary pQTL associations.
The researchers also used colocalization between pQTL in plasma and expression quantitative trait loci, or eQTL, in different tissues as a means to assess how well plasma protein levels reflected biological processes in different tissues. They found a strong positive correlation between the number of cis eQTL in a tissue and the probability of finding a pQTL in plasma in high linkage disequilibrium with an eQTL in that tissue. Overall, they noted, this indicated that a large fraction of proteins ends up in the blood.
The researchers then tested for associations between protein levels and a set of 373 diseases and other traits and identified 257,490 such links. These can indicate that the altered protein level could be a cause of the disease, a consequence of the disease, or correlate with a disease risk factor without being a cause or a consequence, the researchers said. In this case, the sequence variants associated with protein plasma levels were assessed for their association with a disease and vice versa, and the researchers identified 45,334 lead variant-trait associations in the GWAS Catalog, 5,458 of which were in high linkage disequilibrium with a sentinel variant for at least one pQTL. PQTL data combined with disease and other trait associations can help to identify causal disease genes, they added.
"When the same variant in the sequence is associated with the risk of a disease and the levels of a protein, it may allow for the identification of the causal gene at the locus," the authors concluded. "Furthermore, if other variants in the genome that affect the levels of the protein also affect the risk of the disease, it is likely that the level of the protein plays a role in the pathogenesis of the disease rather than being a consequence of the disease. This is of a great importance, as causal genes at a large number of disease loci are unknown."
They further added that pQTL data can provide information about the direction of the effect, which is important for drug target and biomarker discovery when combined with causal gene identification.