Who: Robert Chalkley
Position: Assistant adjunct professor, pharmaceutical chemistry, University of California, San Francisco, 2004 to present
Background: PhD in biochemistry, University College London, 2001; postdoc work at UCSF, 2001 to 2003; assistant researcher, UCSF, 2003 to present.
In 2004 Molecular & Cellular Proteomics began asking proteomics researchers for more data and greater depth of information in connection to manuscripts they were submitting to the journal for publication. Since then, a number of other journals as well as funding agencies have followed suit. With the demand, as well as improvements in instrumentation that have led to an increase in the amount and kind of data that can be produced, some researchers are struggling to comply with journal data submission recommendations and requirements.
At the 8th International Symposium on Mass Spectrometry in the Health and Life Sciences in San Francisco last month, Robert Chalkley and Katalin Medzihradszky, adjunct professors in pharmaceutical chemistry at the University of California, San Francisco, presented a poster on how researchers can comply with journal guidelines and why some information is necessary for manuscript submission.
Their poster and tutorial can be found here. Below is an edited version of a conversation ProteoMonitor had with Chalkley about the issue of data submission.
How were the Molecular & Cellular Proteomics guidelines developed?
It was recognized several years ago that there were problems within the proteomics community about the reliability of the data being published. There were quite a bit of data being published where it was very difficult to assess how reliable individual results were. So the journal Molecular & Cellular Proteomics decided it wanted to do something about addressing this.
So in 2004, they came up with some very preliminary guidelines about what they wanted as information that needed to be supplied in order to assess the reliability of results. But there was no point in the journal having one set of guidelines if the rest of the community is not applying anything.
Therefore, the journal sponsored a meeting in Paris in 2005 where they invited about 30 people representing the community. There were researchers, there were instrument manufacturers, there were people who were writing search engines, and then there were representatives from all the major proteomics journals.
In this meeting, we spent a couple of days starting out with the guidelines that MCP had come out with and basically editing them and coming up to a consensus on a set of guidelines which everyone there said were reasonable and should be applied.
Those are the guidelines which are up on the MCP website and they were published in 2005.
What are some of the key points of the guidelines?
The major requirements of the guidelines are [that there should be] sufficient information put into the manuscript, so that the reviewer of the manuscript can say whether the results are reliable or not. It’s talking about if you’re doing database searching, you need to say how you created the peak list and what parameters you used for the database search and how you assessed the results.
That’s the overarching part of it. There are also requirement here for data where there’s the greatest risk of misinterpretation. For example, if you’re reporting proteins on the basis of one peptide identification, then the guidelines ask that the spectrum that you used be presented. Or, if you’re reporting sites of post-translational modifications, then they ask that you show the spectrum for which you identified the modification.
Are the guidelines meant as a way of breaking through the reluctance that people have of reporting their data in general, as well?
There is obviously some reluctance to the extent that it’s just extra work if you have to do all this stuff. And that was one of the issues that came up in coming up with these guidelines. We didn’t want to be excluding people from publishing just because of the burden of getting the data into [a] format that’s acceptable for publication.
To some extent, that’s one of the reasons other journals are applying these guidelines. Some other journals have guidelines that I would describe as a watered down version of these. They’re less demanding than these. And one of the reasons they’ve done that is they’re concerned that some authors are not going to be able to meet these guidelines and expect that if they’re asked to do this, they’ll just try to publish elsewhere.
What do you mean by watered down?
There are [fewer] demands. For example, they don’t ask that you supply the spectrum if you’re reporting sites of modifications. If people meet the MCP guidelines, they’ll meet any other guidelines. It’s not necessarily [true] the other way around, though.
What are the areas where you’re seeing researchers having the most difficulties in complying with the guidelines?
There are several things that they’re having problems [with]. Most of them are just the hassle of creating these … spectra to exemplify sites of modification or single peptide identifications. Depending on what software they use, that can be easy or difficult. For some people to do this, they’re doing screen captures. If the paper is identifying three or four sites of modifications, then that’s not a burden, but if it’s something where they’re reporting 1,000 new phosphorylation sites, then it’s quite a burden to produce 1,000 spectra into a file for submission.
Most of the other problems are about missing various parameters … let’s say, not saying what mass accuracy you satisfied or not saying what score threshold you used for acceptance of results.
Is the process of submitting proteomics data particularly tricky?
I don’t think it’s particularly tricky, provided that you’ve actually read the guidelines before you tried to submit it. I think that some people run into issues – they write a paper and then they try to decide where they’re going to submit it, and then they realize what they’ve done isn’t in the right format for where they’re trying to submit it.
Do these guidelines have the effect of changing researchers’ workflows or is it a matter of their being more scrupulous and careful about their note-taking and data gathering?
Some of it may be changing the way they’re doing the work. For example, quantitation has been a big growth area, so there are guidelines on that. One thing these guidelines talk about is the need to assess both statistical reliability in terms of analytical reliability within your experiment, how accurately you can measure a quantity, but also assessing biological reproducibility, doing a repeat preparation or verifying results by another means to determine whether the results you see — even though according to the results, they’re very reliable —whether they’re actually biologically reliable.
So, I think this is putting more pressure on people to actually think about experimental design and whether they’re actually going to get biological results in their experiments.
Aside from difficulty complying with these guidelines, are you seeing great resistance to them?
I think it’s very difficult to tell. At the [International Symposium on Mass Spectrometry in the Health and Life Sciences], we did give an oral presentation. Not many people turned up. It wasn’t very well advertised, unfortunately.
We are going to be doing a series of these presentations at various conferences to try to get more feedback from people, so I think we’re going to give a presentation at ABRF, and probably at next year’s ASMS. We’re trying to get more feedback and comment, and hopefully, at some point in the near future there may be another round of people getting together and refining the guidelines.
Are there going to be changes made to these guidelines to take into consideration advances in technology and methods?
I think there are going to need to be some changes. Probably the quantitation stuff is something that’s going to need to be looked at because MCP came up with an initial set of guidelines in 2004 and in 2004 almost no one was doing quantitation, so the quantitation guidelines, this is the first iteration that we have.
Maybe [there needs to be more] talking about statistics and good practice, how to validate results, because I think that is a general issue within the community. There are quite a lot of people doing these quantitative studies, but they don’t really understand the statistics, about how to verify the results.
The Human Proteome Organizations Proteomic Standards Initiative is trying to put together its own guidelines in terms of the kind of data it wants to see reported for experiments. How does what you’re doing fit in with that?
There is a little bit of overlap. But the guidelines that they’re coming up with have a slightly different emphasis. They’re coming up with what they’re calling MIAPE, or minimum information about a proteomics experiment. And those guidelines are much more about actually good note-taking and logging what you did.
The MIAPE guidelines talk about who did what experiment on what day, where the vial is stored, where the file is stored and so on. So that’s much more about information management, and there’s very little about the acts of good experimental practice.