NEW YORK (GenomeWeb) – As an assistant professor of computer science at the University of Montana and the CEO of mass spec software firm Prime Labs, Robert Smith has feet in both the academic and commercial sides of computational mass spec.
That positioned him well to conduct a study in which he interviewed 100 mass spec scientists to gather their thoughts on the state of mass spec software, the results of which he recently published in the Journal of Proteome Research and presented this month at the annual meeting of the US Human Proteome Organization. The conversations, Smith said, revealed a field where opinions vary widely as to the effectiveness and completeness of existing mass spec software and where users are broadly dissatisfied with their options.
GenomeWeb spoke to Smith this week to discuss the aims of the study, what he learned, and what it suggests about directions mass spec software development could and should take in the future. Below is an edited transcript of the interview.
How did the idea for this study come about?
Well, I guess for the last seven years or so, I've been very interested in better understanding what different approaches exist for problems in mass spectrometry analysis. I spent quite a lot of time looking into some of these in depth. I guess it's been sort of an evolving mystery, and I had a grant opportunity to better understand the space. I thought that would help me in my academic lab to fine-tune the direction we were taking and better understand what the needs of the field are as far as software goes and algorithms. I think I definitely got that out of it, but the results were quite surprising to me.
What did you find that surprised you?
On the one hand, I was shocked by the … prevalent attitude [among developers] that everything has sort of been done, and that [the approaches] being used currently are the best way to approach these problems. It was just really surprising that developers felt that way.
And then on the user side I was shocked at how much — I guess animosity is a good word —there is towards the great majority of the software that's out there. A lot of people were really livid about it.
It's very possible that because of my recruitment method that sort of only the most irate people accepted the interview request. It is possible that there is some sort of a sampling issue. But notwithstanding, I talked to 100 people, so obviously at least some people feel this way, and it's not a small number.
The disconnect between developers and users, is that to any extent just a matter of the developers knowing more of the landscape and what is available versus the users who might not be as deeply tapped in?
I'm not sure how well I can answer that question from the interview data. My personal opinion would be that it's a mix. In the community we persist in some technologies that everyone should know are very faulty but are still the status quo. I won't speculate on why that is, but it's unfortunate, and I think that it is a serious barrier to progress in the field.
The paper notes that one challenge in mass spec software development is the difficulty developers have in getting access to industry researchers to better understand their work and their software needs. How does this affect what kind of software is produced?
In industry often you are dealing with experimental workflows generating terabytes [of data]. There are some academic labs who do that, but it's not usually what they're doing. And what this results in is a very different weighting of priorities for [software] development, and that's part of the issue.
In the academic space there's always somebody who has a really nice balance of opinion and time. On the commercial side that's very hard to find because these people are getting paid based on their throughput, and they can't just stop everything for two weeks. But the main problem is actually not even time. There are enough people [in industry] who would be willing to coach the software development process if it weren't for the restrictions of the employee agreements, the non-disclosure agreements, and the policies these companies make.
I think these companies are really shooting themselves in the foot with such locked-down policies, specifically towards software development. It would make sense to me for them to become a little bit more open in allowing their scientists to communicate with software development companies, because it's a win-win. The [software developers] would be able to do their jobs better, but also the companies would have access to much more information to develop a product that's better suited to the people it's intended for.
Are the needs of users general enough that a company can plausibly write a piece of mass spec software that will address the demands of a majority of users?
On the commercial side I'm not sure I have the answer to that question, but on the academic side I know that the opinion of at least some big-name software development teams is that there are a lot of submarkets that aren't economically viable. One of the executives I talked to explained to me that their company, and it's a big one that people have heard of, specifically avoids the lower-volume use cases. This is … a real challenge for [mass spec] software. It's a complicated field, and there's a lot of domain information that you need.
The diversity of experimentation is humongous and the question is how [do you address it]. Let's just take a generic software development company. Why on earth would they chase this market? There are already so many different players in it and none of them, really, have cracked into a majority market share. What are you going to come in and offer these people that 20 people, 20 companies, don't already offer?
Is this a problem vendors should take the lead in tackling? Would, say, better software offerings help drive sales of their mass specs?
If you take a mass spec instrument company, their bottom line is instrumentation. Software is an afterthought. It's a marketing mechanism, not a product. So, sure, they sell it. They have salespeople. They go out and collect requirements and they actively try to get people on their software. But that's not what's seen as what's going to make them survive or perish as a company … so it's handled in a minimal way.
I tend to have kind of radical views about this, but I think that the best progress forward [for vendors] would be to completely open the software side of their instrument and get out of the business of doing software. I think that would be a win for the community and actually a win for them. They'd probably actually sell more hardware that way.
On the other hand, one of the takeaways of your study seemed to be that the open-source software produced by, say, academic groups, tends to come with downsides in terms of support or robustness or aspects where you might see a large company do better.
I think that these are actually two different topics. One is whether instrument vendors should continue making instrument software and downstream processing software, and there I think that if you're not going to do something well you probably aren't helping yourself by trying to do it. Then the second question is what about all these open-source projects for downstream analysis?
I started this process pretty ambivalent about the whole open-source versus commercial thing, and then this came up repeatedly during the interviews. I asked [one person] about his perspective on patents for software in this space and he said, "Well, you'd never expect a life-saving drug to have the R&D necessary to create it without the potential for paying it off."
And it turns out that there are exceptions, but if you want to really tackle and solve this fundamental problem that many, many people have failed to address, it's likely that one of the shortcomings of the approach … has been sufficient capital to tackle it. I think that, like you said, anyone can look at open-source offerings and see what the issues are: the disjointedness; the support is nonexistent in most cases. In a lot of cases, these projects, even if they're handled well, are very narrow because that's sort of the limitation of their funding stream. What's more common is that their purpose is to get more funding from the government, not so much to actually solve the problem for the users. I realize that's probably an offensive way of putting it for these folks who are putting their heart and soul into their project, but from a user perspective that's sort of what it seems like. Whereas, if you have a commercial product and people don't like it, it's going to die and you won't have a job.
I think it's an inescapable problem, because even if you have a group that does a really good job, and I could name a few, how are they going to recruit enough capital to take on the whole problem, not just a very narrow segment of it? I'm not sure any group has done that successfully.
Did the interviews point to any especially key problems or areas the field could most profitably focus on right now?
One would be better support. Users don't generally perceive they are getting a good deal of support from their software. Another would be in terms of quality — better testing of the software that's out there. I think that it would be nice for academics … in the computational space to spend more time on coming up with ways of evaluating software. That's something that my academic lab is more focused on now as a result of all this.
I think those are good starting points. And those are simple things. Support shouldn't be hard. Testing shouldn't be hard. [That they are problems] sort of highlights the problem in the way most open-source free software is developed in the field. These aren't large nationally funded efforts. They're more like little tools that were created in a lab to help solve a problem that either didn't have a public solution or where the PI wasn't aware of a pre-existing solution, and so they just kind of coded their own. Or they couldn't get the pre-existing solution to work! So it's kind of a chain reaction, where they [coded a solution] because the previous solution didn't have testing and support. That sort of development is never going to have sufficient support or testing because it's kind of antithetical to that process.
So that's something we can all work on and just mitigate this, what I call the proliferation of methods, where you're not necessarily seeing innovation at the same rate as publication.