Skip to main content
Premium Trial:

Request an Annual Quote

Rolf Apweiler, HUPO s President-Elect, on the Growing Role of Bioinformatics in Proteomics

Premium

At its recent annual meeting, the Human Proteome Organization elected Rolf Apweiler of the European Bioinformatics Institute as its next president. Apweiler is currently chair of HUPO's Proteomics Standards Initiative and leads the sequence database group at EBI, where he coordinates the UniProt/SwissProt database and other projects.

BioInform spoke to Apweiler by phone to get a better idea of his vision for HUPO and how his background in bioinformatics will help him lead the organization.

Congratulations on your appointment as HUPO's president-elect. Can you discuss how your bioinformatics background might help you in this role?

I was surprised that people put so much trust in me because I'm not really a proteomics person. I started as a biochemist, and proteomics is in some way proteomics at a larger scale, but nevertheless, for the last 10 years or so I haven't done anything in the wet lab and was a bit surprised by [the election].

But on the other hand, if you look at the work of proteomics researchers nowadays, then one of the biggest bottlenecks, especially in large-scale mass spec experiments, is really the data analysis and the data quality and weeding out false positives and worrying about false negatives and so on. So in some way, that's probably understandable, and the area of bioinformatics goes across all subdisciplines of proteomics or even activities that are related to proteomics. So I think that's probably what qualifies me for this.

It appears that there's been quite a bit more focus on bioinformatics in the proteomics field recently, and a lot of development in this area.

I think it's sort of a natural process in all biology that moves from smaller-scale activities to activities that produce a large amount of data. Then, at some stage, bioinformatics becomes important. That was the case in sequencing when it moved from one postdoc to sequence one gene to the large-scale sequencing centers. Also gene expression — analyzing gene expression is really not something new, but bioinformatics became important for that with the advent of microarray technologies, when people started studying a lot of genes rather than the expression of a few genes. And proteomics is in some ways the same, and metabolism, metabolomics, and so on will be the next [area] that will be hit by this data avalanche, and they need to get people into the field that are used to working with databases [to] develop tools to handle the data and analyze the data and try to validate the data in an in silico way.

What do you see as the primary challenges in proteomics informatics right now?

For us, the biggest challenge right from the beginning was trying to find some common standards for reporting the data. That was the work of the Proteomics Standards Initiative. … This PSI effort was what we saw in some ways as the most important thing in proteomics, that you can really get standardized reporting of the data and collect the data, that you can exchange the data based on this standard and allow people easy access to the data. And there, we are on quite a good road. It exceeded my expectations, and we need to move forward from these standards to have international federations of repositories where we can store such data, and work with journals and with funding agencies to encourage scientists to actively submit data [in these standards] into these databases, so that they can then go to these databases and find all the data that is otherwise just buried in journals and very hard to harvest, to mine, and to combine with your own data.

On the other hand, there has been ongoing work in a lot of companies and institutes for better software for analyzing proteomics data, and also better statistics for validation of the results. There are a lot of other upcoming efforts, like how to combine these data sets with other data sets to make an easier comparison from proteomics to transcriptomics and so on, but that's probably still further down the research pipeline.

But data integration is one of the big troubles and worries in this. A lot of effort needs to be done there. It will only work when we have at least some basic standardized reporting, and a lot of data, and dumping this data into databases so that it's accessible and so that you have something that you can integrate. If it's fractured, and if it's spread out all over the world in thousands of little lab databases, then it's impossible to integrate it all. So you need to have some players in every country who try to be national or continental or whatever types of nodes for this type of data, and these nodes would then need to work together so that you can build an international federation of people who deal with such data.

What's the role of vendors in all this? It's my understanding that proteomics standards were a problem in the past because vendors weren't interested in standardizing formats so that they work across platforms. Is that changing now?

This was a worry we also had when people volunteered us to do something about standardization, but I must say that the response of the vendors has been much better than I anticipated. Vendors are present at all of the PSI meetings, and they are actively doing things, they are actively participating in the process. And it's quite clear that they all want to have common standards as an additional output format they can implement. What they are of course interested in also is a certain degree of stability. They can't implement new versions every month, so we need to have interaction with the vendors to have a clearly stable release cycle so they have enough time to adapt their software to enable new versions of these formats to be used. We can't do that too quickly, so we can only work in dialogue with them and compromise in a way that is really suitable. But I believe we're on a good way there. At all of the mass spec PSI meetings vendors were there, and I don't think we have a significant company that's not on board with it.

There's been so much interest recently in software and standards development on the mass spec side of proteomics. Are you seeing the same type of interest in, say, 2D gel analysis or other technologies for proteomics?

Well, the first area where we made a lot of progress was in the reporting of protein-protein interaction data. And all the protein-protein interaction databases have now implemented the PSI standards as a shared output format, so that you can go ahead and combine these data sets together. And also, there was agreement between all these databases that we can start exchanging data so that after a couple of years, all the data that was collected by the participating databases is available in each of the databases. So we work with five databases that together, we share the curation effort, and there is no duplication any more of work, the same standard of annotation is applied at all different sites. So that was the area where we made the biggest progress.

Mass spec is going quite well, and I guess that by the end of next year we will probably have a working federation of proteomics repositories dealing with protein identification. We'll use then the HUPO PSI standards for exchanging data among each other.

And we had in Geneva just last week a PSI meeting where the 2D gel side was discussed in a bit more detail. But I think there, it's still a much earlier time and I would say we're still in the information-gathering phase before we can move forward to consensus building. So at the moment, we're still sorting out the different use cases for work in this area.

There's a lot of movement in the field of proteomics toward biomarker identification. What role do you see for bioinformatics and standards in advancing this area? There seems to be some debate around the statistical rigor of some current methods.

I think that is more of a clinical statistics problem than a bioinformatics problem, and the bioinformatics part comes more in the research towards biomarkers, but then later on it's really a clinical statistics problem and has not so much to do with bioinformatics. But where we need to be helpful is if we have different disease areas where people use proteomics technologies to make comparisons between healthy and disease states for finding potential biomarkers and so on. Then we should be helpful in having some guidelines in what to do in a standardized way on the bioinformatics side, in the research towards finding biomarkers.

Biomarkers is the central theme, that's quite clear. But the work can only come out of the different communities working on certain diseases, and there, I'm happy to listen and try to figure out how we can be helpful from the bioinformatics side. But I think I see more that my future role there is to encourage research in this area, and use HUPO as an instrument to propagate the importance of these research fields at funding agencies, at levels where people should know about it and put their weight and investment behind this important area. I don't see that my involvement would be so much on the bioinformatics side, but more on clarifying the important dimensions of this field.

As the head of the PSI effort, you've already been able to wield some influence within HUPO, but what other opportunities do you foresee in this new role in terms of advancing proteomics?

For HUPO, there are two important parts that need to be done. One is really more of a housekeeping issue, in which HUPO becomes a proper part of the world of scientific organizations. It's still quite a young organization and it has proper foundations, but we need still to clarify the role of HUPO as an international organization to national proteomics societies. A lot of members of these proteomics societies are also members of HUPO, but [we need to] interact in a way with each other so that we are optimizing our combined voices to make us heard, instead of two organizations doing something in parallel and don't know it. We should avoid such things at all costs.

Then the second thing is that we need to clarify what HUPO as scientific organization will contribute to science, and how we interact then with the scientists and the funding agencies and so on. So at the moment, proteomics is very much seen as a technology, and mainly the large-scale use of mass spectrometry. But we need to show that we are unleashing the power of this technology to address very, very important biological questions like biomarkers or disease analysis of healthy versus diseased tissues or organs and so on, and that we can drive that from the proteomics side in collaboration with related disciplines, that we find a good way to propose interesting biological initiatives that are important and will be driven by proteomics that we can then try to encourage funding agencies worldwide to bring requests for applications and calls for proposals out. And HUPO as a scientific institution will then ensure that different principal investigators who will get individual funding in different countries know very well about each others' efforts and can try to coordinate across continental boundaries, and across national boundaries in a way so that the work is optimized.

HUPO is not a funding agency, but we are so far quite successful in getting proteomics researchers to talk to each other across boundaries and try to figure out how to work within the boundaries of national funding schemes together.

Sounds like a lot of work.

Well, I will not get bored.

Filed under

The Scan

NFTs for Genome Sharing

Nature News writes that non-fungible tokens could be a way for people to profit from sharing genomic data.

Wastewater Warning System

Time magazine writes that cities and college campuses are monitoring sewage for SARS-CoV-2, an approach officials hope lasts beyond COVID-19.

Networks to Boost Surveillance

Scientific American writes that new organizations and networks aim to improve the ability of developing countries to conduct SARS-CoV-2 genomic surveillance.

Genome Biology Papers on Gastric Cancer Epimutations, BUTTERFLY, GUNC Tool

In Genome Biology this week: recurrent epigenetic mutations in gastric cancer, correction tool for unique molecular identifier-based assays, and more.