NEW YORK – Proteomics has shifted over the last decade from an emphasis on depth of coverage and numbers of protein identifications to focus increasingly on sample throughput.
But while the field has come to see large sample cohorts are necessary to account for biological variation and detect the sometimes small changes indicative of a particular disease state, some have raised questions around whether existing high-throughput workflows provide enough depth to measure the proteins likeliest to prove effective biomarkers.
This is particularly a question for plasma proteomics, which has seen a rebirth of interest in recent years. Plasma is an ideal sample for biomarker discovery as it is easily accessible and a likely sample type for any ultimate clinical assay. However, human plasma both has a large dynamic range and is dominated by a few high abundance analytes, with the 20 most abundant proteins making up more than 95 percent of its total protein mass. This makes it difficult to quantify lower abundance proteins such as tumor markers or markers of cell damage using high-throughput assays that employ short LC gradients and little or no depletion and fractionation.
It is possible to measure on the order of 5,000 proteins or more in human plasma, but such experiments require extensive depletion and fractionation and are not very compatible with the kind of throughput many researchers aim to achieve. Higher-throughput mass spec-based plasma protein assays typically measure somewhere in the range of 300 to 500 proteins in undepleted plasma and between 500 and 1,000 proteins in depleted plasma.
"Throughput is wonderful, it's a very important thing," said Ian Pike, CSO of proteomics firm Proteome Sciences. "But we're almost going back to the 2D gel days where we are just going to be sequencing the same 200 to 300 medium-abundance proteins, and that probably isn't going to give a strong enough signature for every possible indication that we want to do."
"If you can find it with that [level of depth], great," he said. "But mostly you're going to be stuck in that pool of the sort of low microgram-, high nanogram-per-mL at that level of throughput."
The appropriate balance of coverage and throughput is "sort of the million-dollar question," said Jochen Schwenk, director of translational plasma profiling at the Science for Life Laboratory at Sweden's Royal Institute of Technology.
"The more you can measure, the more processes you are probably able to capture, but there may be other processes that are reflected in even the higher-abundant proteins," he said.
"If you have say, a severe case of COVID-19, there are a lot of changes you will see in that individual, maybe due to liver failure, maybe due to reduction in oxygen supply, so you will have these severe changes in phenotype," Schwenk said. "If you have, you know, a small tumor leaking a specific molecule, that will be much more difficult. Proteomics by its own will not be informative enough."
The question of what disease states existing plasma proteomic workflows can potentially address "is very hard to answer," said Philipp Geyer, chief scientific officer and co-founder of German plasma proteomics firm OmicEra Diagnostics. "This would be quite specific for each disease, and for most diseases we don't know how deep we have to go into the plasma proteome."
Like Schwenk, Geyer noted that some conditions are more likely addressable with existing proteomic technology than others.
"For example, for metabolic diseases like diabetes or liver diseases, we know that we can find new biomarkers," he said, citing the example of the protein PIGR (polymeric immunoglobulin receptor), which he and colleagues at the Max Planck Institute of Biochemistry, where Geyer was a post-doctoral fellow, identified in 2019 as a potential marker for non-alcoholic fatty liver disease.
He highlighted cardiovascular disease as another area where he expected researchers could find and develop markers with existing technology.
Cancer, particularly detection of early-stage tumors, will likely prove more challenging, Geyer said, noting the difficulty of detecting small amounts of cancer-specific proteins secreted by tumor cells into the blood. This observation is perhaps borne out by the failure of proteomics generally to identify any widely used markers for cancer early detection despite significant efforts in this area throughout the field's 20-plus years.
Geyer added, though, that the success of a cancer biomarker effort will likely also vary with the type of cancer being explored, citing the example of thyroid cancer where serum thyroglobulin is an established marker for monitoring the condition.
"So, it's possible to find cancer biomarkers [that are detectable] at high throughput," he said. "It depends very strongly on the disease."
Markus Ralser, a group leader at the Francis Crick Institute and the developer of mass spec software and workflows for high-throughput proteomics experiments, said that it is still unknown what insights and biomarkers researchers might be able to generate through the analysis of several hundred plasma proteomes across large sample cohorts given that the ability to do experiments on this scale was still very new.
"We don't know yet, because proteomics has only very, very recently achieved the level of throughput where we can go to population-scale experiments," he said.
Geyer said that because of past throughput limitations, it has been difficult to determine whether markers identified via discovery represented real biology or were just artifacts of biological and analytical variation. The emerging ability to start with large sample cohorts in early discovery will help account for these sources of variation, hopefully making for more effective biomarker discovery, even with relatively modest depth of coverage.
Schwenk noted that while larger cohorts will likely make for more effective protein biomarker work, equally important is having cohorts that have been very well characterized clinically.
"Doing a large population study without knowing much about the patients is probably less informative than doing a smaller, targeted study where you know everything about the patient," he said.
Jennifer Van Eyk, principal investigator for research and director of the Advanced Clinical Biosystems Institute in the Department of Biomedical Sciences at Los Angeles' Cedars-Sinai Medical Center, suggested that while higher-throughput plasma proteomic experiments might not directly measure low-abundance proteins, measurements of higher-abundance proteins could capture information about the behavior of the networks those low-abundance analytes function within.
Secreted proteins like "hormones, interleukin-6, BNP, those are going to be at low concentrations," she said. "But are you looking for the activation of those pathways? Activation of those pathways can be seen in higher-abundant proteins."
She said she believed that quantifying 500 or more proteins in plasma allowed researchers to gain insight into networks that touched almost the full plasma proteome.
"The pathways you are covering [with 500 proteins] are pretty extensive," she said. "I don't know that you need to go that much [deeper]."
While proteomics has in the past often been referred to as an unbiased and hypothesis-free approach to biomarker detection, Van Eyk said that to her mind it is essential for researchers to have some idea what they are looking for when starting an experiment.
"To go in without a hypothesis or understanding what you expect to see is just not the best science," she said. "There are not too many diseases where we are completely naïve about a potential mechanism. If your goal is to make a biomarker for a specific disease at a specific timepoint, then we should have matured past that."
She highlighted as an exception precision health and precision medicine research where the goal is not to identify specific markers for a specific condition but rather to identify proteomic signatures that reflect a person's movement in and out of various disease states.
"I would say that is very legitimate," Van Eyk said. "Because you have such complexity, if there is any place where you need to have really, really good quantitation and precision of measurement it is in that scenario. Because you want to measure a lot, but your accuracy has to be superb in order to pick up the small differences between, for instance, someone with [rheumatoid arthritis] and someone with irritable bowel disease."
"In that case, having more proteome depth is helpful, but with that you can't lose your precision of measurement," she said. "If you are measuring 2,000 or 3,000 proteins but you don't have specificity, which is [an issue with] something like SomaLogic, or you don't have accuracy, which can be a problem with some mass spec approaches, then you shouldn't necessarily go down that route."
Asked whether existing proteomic technologies offer the combination of depth, throughput, and precision needed to effectively pursue such precision health work, Van Eck replied, "that's a great question."
She said that using its data-independent acquisition discovery workflow, her lab could realistically run around 3,000 samples per mass spec instrument per year, measuring around 600 to 700 proteins with coefficients of variation under 30 percent.
"We’re not quite there, but I think we're very close to getting into those larger numbers," she said.
Ralser said that he also expected improvements in machine learning and artificial intelligence would allow researchers to combine panels of markers in more informative ways, making better use of the proteins that can be accurately measured.
"There are very advanced statistical methods that can take into account not only one or two or three features that you measure, but effectively the entire profile," he said, though he noted that the potential of such an approach had yet to be translated into clinical reality.
And, of course, a variety of researchers and firms are working to move plasma proteomics technologies forward in ways that will reduce the trade-offs between depth, throughput, and accuracy.
In March, for instance, Swiss proteomics firm Biognosys said that it expects by the end of the year to begin offering a new discovery proteomics workflow that will allow it to quantify around 2,700 proteins in a typical plasma study, and that it expects to manage around 3,300 proteins in large-scale discovery studies by the end of 2022.
In an analysis last year of 141 plasma samples, proteomics firm Seer showed its Proteograph system could identify roughly 2,000 proteins.
Proteome Sciences has begun running large-scale experiments using its TMTcalibrator tool, which can quantify more than 5,000 proteins in plasma.
SomaLogic's aptamer-based SomaScan platform can measure 7,000 proteins in plasma, while Olink is able to measure 1,500 proteins in plasma and expects to expand that figure to around 3,000 in coming years.
"I'm curious to see what the next few years bring," Ralser said.