A recent survey of around 100 bioinformaticians conducted by Eagle Genomics found that data integration is one of the largest technology concerns for the community, and while both academic and commercial groups still rely to a large extent on in-house data centers, they are considering shifting to the cloud in the future.
The survey showed that most respondents are primarily involved in analysis of gene expression and genomic variation, though there appears to be a shift in focus towards metagenomics, systems biology, and pathway analysis. Furthermore, although comparative genomics is a current focus area, the survey showed that its popularity is likely to wane over time.
Most respondents indicated that integrating data is a key research area for them, followed by de novo genome assembly, genome resequencing, and RNA-seq analysis. Eagle said that 60 percent of respondents reported that they were already working on data integration.
Meantime, barely 25 percent of bioinformaticians currently work on microarray analysis and even fewer anticipate working in that area in the future.
When it comes to hardware, both academic researchers and their commercial counterparts primarily rely on in-house compute clusters and servers for their analysis with only a small minority running their pipelines on the cloud. However, researchers in both academia and industry indicated that cloud computing is definitely on their radar and they will consider it in the future.
Most respondents indicated that they currently take advantage of open-source software and plan to keep doing so. Reliability and scientific validation were ranked highly in terms of considerations researchers take into account when selecting open-source software, while commercial support was considered the least important requirement. On the downside, most respondents agreed that open-source software packages are not easy to integrate with each other.
The results of the survey are available here.
This week BioInform discussed some of Eagle's findings with William Spooner, Eagle's founder and technical director. Below is an edited version of the conversation.
What was the impetus for the survey?
It was pretty simple. We are professional bioinformaticians and we are looking at a rapidly changing environment in bioinformatics with a lot of new technologies that are coming on stream and a lot of very data-intensive technologies that are producing a lot of data. We are looking at this whole landscape and saying, 'How is the analysis and the market for the analysis going to change in response to the new data-generation technologies?'
It started off as pure market research for our benefit and we were originally intending to do an e-mail and telephone [survey] of selected people to try to gauge their responses. But since we had gone to the trouble of designing the survey, we thought the most useful thing and the best thing for the community would be to run a standard web survey and make the results freely available.
There were 118 respondents. Fifty percent were university, a further 20 percent were non profit, and everybody else was split between pharma, agribiotech, and biotechnology, so probably about 30 percent were from the commercial sector.
Where there any surprises when you looked at the results?
There were two real surprises. The first of them was that data integration was the most used technology out of anything. People are really using data in the context of the public reference. The other thing was really how much the microarray techniques have completely fallen behind the sequencing techniques in terms of the technologies that are actively being used, for this particular poll.
The results show a shift towards systems biology and pathway analysis, and to some extent proteomics. Why do you think that is?
There were a large number of academics in this study, so … they will be looking for the next big thing, [a trend we see] as consultants. We are finding that some of the people coming to talk to us are talking more about metagenomics, pathways, and de novo assembly.
Most folks, at least on the academic side, seem set on sticking to in-house resources. As such, its appears that outsourcing firms are missing out on a huge share of the market when it comes to academics. Would that be a fair assessment?
Based on these results, it very clear that there is not a great deal of appetite among professional bioinformaticians to start outsourcing bioinformatics. So that’s not surprising.
What will be very interesting will be to repeat this survey again in a couple of years and see whether these attitudes are changing. Certainly having your analysis done by a third party is becoming very mainstream within molecular biology. Even academics will send out their samples to a core facility, and that’s pretty much outsourcing. I think there has been a very strong shift towards outsourcing of the data generation, and whether the same thing is going to happen in some of the analysis — especially some of the primary analysis ... remains to be seen.
On the cloud computing front, both academics and commercial companies are more willing to outsource their computing requirements as well as consider cloud computing. Any thoughts on this?
For the business that we are doing, we couldn’t do it without cloud computing, so really we see that there is a huge value in cloud computing. It's now matured to such an extent that the immediate security concerns have gone away now that some of the cloud vendors have started to operate on regulated environments.
It's almost shifting to where people are seeing that the cloud is potentially more secure because somebody else is looking after the infrastructure to a certain very high and well-documented standard. I think its really becoming fairly clear that this is the way forwards and I think the survey bears that out. There is a lot of support and a lot of people are anticipating using the cloud.
You ran the survey prior to your symposium dubbed "Provisioning Bioinformatics for the Next Decade: Are we prepared?" Based on the results of this survey, is the community prepared?
There are three things that we will need to do. There will need to be more bioinformaticians and more efficient bioinformaticians; there will need to be more hardware and more efficient hardware; and the same with software.
Looking at this survey, it's clear that the hardware and computational side of things are being addressed and people can see the way forward for that, but for software and for the head count of bioinformaticians, I don’t think we have hit a tipping point on that yet.
Clearly people are saying we need more hardware and somewhere to run the analysis, but in terms of the software and the number of bioinformaticians, I don’t think that there is that much pain.
Certainly what I hear is that it is very difficult these days to recruit experienced bioinformaticians, so I think the pain is going to increase and there has been a lot of commentary around that.
There was a paper out of Washington University last year by [Elaine] Mardis about the $1,000 genome and the $10,000 analysis, and I think those attitudes are going to become more prevalent as people find it difficult to get data processed.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.