NEW YORK (GenomeWeb) – As one of the leaders of the Human Proteome Project (HPP), Mark Baker, professor of proteomics at Sydney's Macquarie University, has for years been involved in the effort to nail down evidence for every human protein.
Currently, the effort has high-quality data identifying roughly 82 percent of the human proteome, leaving 18 percent, or around 3,500 of the anticipated 20,000-plus human proteins "missing." Many of these missing proteins, though, Baker noted, are hiding in plain sight.
In a paper published last month in Nature Communications, Baker and his colleagues highlighted the need for HPP participants to use biological findings from outside the realm of proteomics as they chase evidence of the remaining unidentified proteins. To aid in this effort, the researchers have established a database called MissingProteinPedia, which compiles protein information that, while less stringent that the mass spec or antibody-based data required by the HPP, could prove useful for researchers as they work their way through the list of outstanding analytes.
The notion has its origins in a 2014 forum in which Baker and other members of the Chromosome 7 HPP team met with researchers outside proteomics to discuss the chromosome 7 proteins they had yet to identify.
"We invited pharmacologists, a lot of people working on G protein-coupled receptors, because our rudimentary analysis of what kinds of proteins were missing on chromosome 7 showed that there were a lot of [missing] membrane proteins, a lot or proteins associated with olfaction and taste," Baker said. "And these colleagues told us, 'You can't call these missing proteins because there is so much evidence about them.' It might not be mass spec evidence, but most people don't give a flying fig about whether it is mass spec evidence or not."
Sometime afterwards, as Baker tells it, he and his colleagues decided during a lab drinking session to see what they could dig up about several of their missing chromosome 7 proteins.
"We looked them up on Google and ran searches in mutational databases and pharmacological and drug reference sources, and we found that there was a bucketful of information already published about many of these missing proteins," he said.
He cited the example of the protein prestin, which, they learned, hosts mutations associated with hereditary deafness. This finding led the researchers to contact colleagues at cochlear implant firm Cochlear, which is headquartered at Macquarie.
"We spoke to one of their guys and they said, 'Oh, prestin is a well known protein. We know the mutations, we know where it is. It's just that it's only [expressed] in the [roughly] 500 outer hair cells of the ear,'" Baker said. "'The chances of you finding it by mass spectrometry are one in a gazillion.'"
This, he noted, reinforced for him and his colleagues the idea that identification of many of the missing proteins would likely require significant biological knowledge.
"There might be a lot of proteins that are so localized to particular structures and particular cells and at low copy number that we may never find them," Baker said.
"So we thought, let's put it all together, firstly to make sure that the rest of the world doesn't think HUPO and the Human Proteome Project are ignoring all their great science, and secondly, to give our chromosome teams and the biology teams that are looking for these proteins a lot more information on where to look and when to look and maybe under what stress conditions to look," he said.
The HPP requires high-stringency mass spec or antibody data before counting a protein as identified for the purposes of the effort. But, noted Shoba Ranganathan, professor of bioinformatics at Macquaries and senior author on the Nature Communications paper, these are of course not the only ways to establish information about a protein.
"There are, for instance, old-fashioned biochemical methods that never seem to be picked up [in the HPP efforts], so we thought, let's cast our net and figure out if there are scientists with experimental data where they have, for instance, isolated the protein, separated it on a gel, or even [generated evidence using] interaction studies," she said. "Anything that tells us, hey, this protein is there and it has a function."
Baker said that his team's analysis found potentially useful biological information on around 2,900 of the 3,500 or so outstanding proteins. And such information will likely be necessary to pin down a number of these remaining molecules, many of which are expressed either in very small quantities or under highly specific conditions.
He noted the example of olfactory receptor proteins, a class of protein that has stymied proteomics researchers for years. According to the Nature Communications paper, of the 411 olfactory receptors, just two are considered detected according to the 2016 release of the protein database NextProt, and neither of these have been identified sufficiently to qualify as "found" under the HPP criteria.
Baker noted that a look at the biology underlying expression of these receptors provides some insight into why they've been so difficult to find.
"We tried to get pathologists who could give us olfactory sensory neurons and things like that," he said. "But when we learned a bit more about the biology of it by opening up to other sources of data we found that of the 400 or so olfactory receptor proteins, the vast majority are switched off. Cells focus on a particular type of molecule they want to capture and detect, and so you can almost train your brain to produce those receptors by being exposed to particular smells."
In other words, to detect a particular olfactory receptor, researchers not only need to collect samples of the specific cells expressing these proteins, but samples from people who have been exposed to the smells that would trigger their expression.
Given such challenges, it's understandable why Ranganathan said an analysis of the HPP's progress suggests the group will need another 10 to 40 years to identify the complete human proteome.
Baker said, however, that he believes increased use of outside biological knowledge could speed the process significantly. "Our hope is that we can cut that time in half," he said.