Human Proteome Organization Proteomics Standards Initiative
Name: Chris Taylor
Position: MIAPE Coordinator, April 2006 to present; co-chair of the sample processing working group at the Human Proteome Organization Proteomics Standards Initiative, April 2006 to present; lead coordinator of the MICheck project, April 2006 to present; senior software engineer EMBL-EBI, 2003 to present.
Background: PhD, University of Manchester, UK, 2000; coordinator of various standards working groups at HUPO PSI, 2003 to present.
Taylor is heavily involved in the HUPO PSI effort to create reporting standards for proteomics research. He is also helping to create MICheck, a “central registry” of information on reporting-standards that scientists, particularly biological researchers, can reference when submitting information to journals. The purpose of such a registry is to ensure that data, methods, analyses, and results are described to a level that colleagues and peers can ascertain for themselves the validity of experiments and research.
ProteoMonitor caught up with Taylor to see where both efforts currently stand.
Tell me about the ‘central registry’ and PSI’s role in its creation.
It’s a much broader project than just PSI. It’s drawn in all sorts of biologically delineated groups. (PSI) is just one of the players essentially. It was at our meeting that the discussions took place because we’ve really made an effort to reach out to other sorts of “omics” communities and to other biologically defined communities because in the course of our development it became pretty clear there were areas where there was a lot of overlap with different groups of people, and looking ahead it would be redundant for us to, say, develop stuff that talked about how you obtain specimens before you did any kind of processing. It would’ve been redundant for us to talk about general use of stats or to talk about general notions of project design or something like that.
So the whole point of the MICheck thing is really to find these areas of potential redundancy and to anticipate them so that we have a sort of this coordinating point where people that were overlapping in their interest could discuss their overlap.
Basically we would want this to be a one-stop shop kind of thing for anyone that was looking for appropriate reporting guidelines for the work that they were doing. Then on top of that, once we have a reasonable body of stuff there and some good will from the community, what we would then want to do is get representatives from each community to work together to look at where the significant overlap [is], to look at where there are gaps and think hard about how we try between us to produce a really properly integrated set of guidelines that covers a large amount of space sort of thing. So you’d be looking at modules that dealt with different sorts of biology, modules that dealt with uses of different types of technology.
Why is there such a need for reporting guidelines in proteomics?
Basically the level of annotation is pretty variable when people are reporting experiments. And without a decent amount of information about what was done, you really can’t understand the data. I mean for someone to claim that they had a conclusion based on an experiment that they ran where they don’t tell you anything about the experiment, they don’t really say anything about the sample or they don’t tell you much about how they did the data or analysis—really, they might as well be making it up.
So if you really want to be able, first of all, understand what was done, second of all, believe that what was done was valid and maybe corroborate it through another experiment, either [by trying to] repeat their experiment, or do something in support of it, you really need a lot of information about what was done.
This is information about the organism or whatever the biological material was that you were studying. It’s information about how that material was kept over time and what happened along the way, how you did the analysis.
Proteomics has been through this kind of a phase where really the goal was to an extent to get as large a list of identifications as possible. And now as it becomes a more mature set of technologies, people are more concerned about quality of data so they are obviously now concerned about the quality of other people’s data.
Why has the reporting been so bad?
Well, it takes a lot of time and the opportunity to disseminate the information is pretty limited.
First of all, often it’s quite hard to get the information out of a machine into a format that you can then pass along. That takes a lot of manual labor. Certainly in proteomics, we’re only really now starting to see the databases begin to be developed which allows you to put a certain amount of annotation about what happened. Up to now, you couldn’t do it in a paper because space is so limited. Materials and methods sections are pretty short and to go into huge detail about how you parameterize an instrument is not appropriate.
So overall, it’s this sort of horrible combination of the fact that it’s been quite labor intensive to share the information and that realistically there weren’t that many ways to share it even when you made the effort to share it.
How long is it going to be before we have these reporting standards?
The first set of MIAPE [Minimum Information About a Proteomics Experiment, a PSI issued set of guidelines] modules went to Nature Biotechnology a couple of months ago. They’re in review and they represent the opinions of lots of people who’ve contributed to PSI, and in each case are a pretty heavyweight list of the good and the great from each of the particular fields. So there’s one for wet-work with gels, there’s one for mass spectrometry, and there’s one for the informatics around mass spectrometry.
[Nature Biotechnology was] in receipt of those papers about two months ago, maybe. We would hope that they would be out before the end of the year subject to review of course.
Then probably there will be sort of a settling-in period where a lot of people will start to take that sort of thing seriously when they saw it in print and their funders and the journals they’re trying to publish in were saying to them, ‘We need you to do this.’ At that point, we’ll get an awful lot of complaints and probably those modules will go through a sort of further evolution which will probably mean reducing the set of stuff that’s in there.
The other thing is that the mass spec vendors have said that once the appropriate MIAPE guidelines settle that they won’t just export in this public format, they will go a stage further and will actually export what MIAPE requires in that format. So really to be MIAPE-compliant in mass specs could be as simple as a button press. And if that’s true, then probably there won’t be much reaction.
Where there may be more in the way of problems is where there isn’t really any opportunity to automate reporting, so things like describing the structure of a project and why you’re doing the work at all. Things like columns, the degree to which columns are automated and data collection is automated varies a lot. Gels, again, it varies a lot.
Is the business community, the mass spec community, the chromatography community behind you as well?
Absolutely, because in their books, they’re there to sell solutions to experimentalists and we’re essentially generating a problem for them to sell a solution. They’re thoroughly into it.
So are we months away, years away or decades away from having reporting standards for the proteomics community?
It’s coming out as a patchwork. For transcriptomics, MIAME is pretty much a done deal where it’s appropriate. It doesn’t cover much of the biology but certainly in terms of the use of arrays, it’s more or less an accepted thing that you generate that data set.
Proteomics, I think that by the time we get to the point that this is really quite mature and people are doing this as a matter of course is probably a couple of years.