ROUNDTABLE PARTICIPANTS (left to right):
Peter Morrissey — worldwide manager for healthcare and life sciences, IBM
Matthew Trunnell — managing director USA, Genedata
Michael Athanas — principal investigator and founding partner, The BioTeam
Jeffrey Wiseman — vice president of technology and informatics, Locus Pharmaceuticals
It’s a rare chance to have four industry experts on high-performance computing trapped in a room — but it’s one that many IT users might give their eyeteeth (or at least a few processors) for. Genome Technology hosted this discussion in mid-February to get to the bottom of how researchers involved in HPC can get the performance they need but keep costs down at the same time.
Our participants — one from IBM, one from a small pharma, one from a software development firm, and one from an IT consulting company — spent more than an hour debating the major issues and comparing advice. What follows is an excerpted version of that conversation. Read on to get their takes on costs of software development, how to balance resource utilization, the importance of scalability, and more.
Genome Technology: When we were thinking about topics for this discussion, there was no getting around the fact that the cost of high-performance computing seems to be an increasing focus in the field.
Trunnell: It seems like over the last two years, three years maybe, that bioinformatics as a discipline has really been called to task to justify its existence — or at least its expense. A lot of the internal IT departments and the internal bioinformatics departments I’ve dealt with over the last year have seen hiring freezes and budgetary freezes. I think this is actually distinguishable from just the economic downturn: there really has been a critical eye internally at large organizations saying, ‘We’ve spent a lot of money over the last five or 10 years on people and hardware and software, and what’s come out of it?’
One major trend in the past year is that people have been a lot more cautious, which is a good thing. We have to ask the question of where is large-scale, high-performance computing really required.
Genome Technology: What is the biggest cost-saving decision you’ve made for your IT infrastructure — or for your customers — in the past year?
Wiseman: From my experience at Glaxo running the cheminformatics department, we did a lot of programming. To me probably the single biggest cost in the whole industry is that you can’t even get close to harnessing the power of the computer system.
I had 50 programmers and there were four groups like mine serving research at GlaxoSmithKline, so let’s say there were 120 programmers. Well, they’re all doing the same thing. What an incredible waste when you’re faced with this kind of task.
Trunnell: One of the biggest, sensible cost-saving decisions that I’ve seen people start to make is to acknowledge that their software demands aren’t so unique, and to start to actually buy software.
Morrissey: From a high-performance computing perspective, just throwing more iron at the problem is not the solution. We sell three things at the end of the day: hardware, software, and services. In excess of half of our revenue — $80 billion of revenue — is services-driven. As much iron as you want to throw at a problem, if you don’t have the right consultative services, either programmers or the right folks to look at a problem, it doesn’t matter how much iron sits on the floor.
A number of pharmas are now saying to IBM, ‘We do have so many programmers, so many solutions, can you come in and help with maybe a pipeline methodology to help piece things together?’ I’ve been frankly quite surprised when we go into companies and do some utilization reviews [to see] how few cycles they’re actually using [compared to] what they have available.
What we’re seeing is that the IT departments are starting to have to share some of the pain that the internal businesses have to deliver. By that I mean if you have discovery or target valiadation, those folks are tied much more directly to the IT folks. ‘This is what I have to do, but you know what, IT, this is what you have to do to help me get to my end game.’
Athanas: We see this all the time — people inappropriately acquire resources that they think reasonably on the back of the envelope they’re going to need. Unfortunately, they haven’t made the connection of how they’re going to use it. You can avert that a little bit by taking that back-of-the-envelope calculation and doing some real project planning. Also in terms of cost savings, it’s the type of resources people use — hardware is a small part of what people are really spending money [on], but it’s how they use the hardware.
I’ve seen people … their pipeline consisted of loading GenBank [into] a Perl hash table, and they needed a 32-gigabyte address space to do that on. That might work, but that’s a very expensive approach. So it’s the approach you take combined with the infrastructure that you built that’s going to give you a cost-effective solution.
Wiseman: Let’s go back to programming because most of the waste is around software development.
Trunnell: In-house software development?
Wiseman: Not [just] in-house, to be fair. My experience is it’s not cheaper to go out because you do face some overhead.
Morrissey: There’s a certain mentality: what I know is my domain and if I share a piece of it with anybody — because it is a small market, with pharma really we’re talking maybe eight to 12 big boys out there — if they share a piece of what they know they lose their competitive edge. So what you see are these silos getting taller and taller and taller.
Wiseman: Everybody’s doing the same thing. Everybody’s trying to sell it internally as a competitive advantage. But it’s not.
Athanas: Are you saying the infrastructure type software, the load management software?
Wiseman: It’s not a competitive advantage. It’s a commodity.
Athanas: [What about] the algorithms you developed that make your analysis faster?
Wiseman: Inside Glaxo that would not be a competitive advantage; that would not transform the industry. That would not increase the profit margin of Glaxo. You can cut out early discovery 100 percent and not change your profit [right now]. Your cost in the early phases is just not that big — it’s only a small percentage of [total costs]. Biology has transformed the industry but it hasn’t changed the cost, it’s only raised the bar.
Genome Technology: Speaking of resources that readers already have, is there some blanket rule we can give for how to evaluate technology, software, the people running it — and figure out exactly what they need?
Trunnell: It’s still very difficult. It’s two different languages and people aren’t good at expressing biological needs in terms of IT requirements. You have to be fluent in two different spaces and most people just aren’t.
Morrissey: Before you jump in and buy the big iron, think about how you’re going to apply it. Do some more due diligence. There’s such a rush in this industry to be first in class or best in class, but oftentimes if you would do the diligence up front and vet the problem you’d be much better off downstream.
Athanas: An important thing people have to put in their vocabulary is scalability — at the hardware level, but it’s also very important at the software level. You have to think about what you need today, but put a lot of emphasis on how to build 100 times more. Network [specs], your intercommunication, software development, training. It’s a good investment, I’ve seen it over and over, where the payoff is very worthwhile.
Wiseman: The training is an interesting comment because it’s something that is very hard to actually think about.
Trunnell: One of the things that we’ve found in our commercial software space is that we always try to bundle training when we sell software but typically we have to almost force training because people don’t make room for it as an investment. It always pays off — it makes people more effective in using whatever tool they’re using if they get training. It’s worth investing the day or two days of a department’s time if you’re bringing new tools in, and doing that on a reasonably ongoing basis.
Wiseman: Ongoing is very important. You very seldom use everything all the time.
Genome Technology: We’ve talked about cost savings. What about going forward, what’s the most affordable way someone can improve performance?
Trunnell: I’ve often found that even within a relatively small environment that resources are not well balanced. I worked in a genomics company that had an enormous computational infrastructure, and the bioinformaticists were just screaming that they needed more power. And it’s because 80 percent of the resources were going unutilized. There is some benefit to be gained, if there is a reasonably large infrastructure, at looking at balancing resource utilization.
Athanas: You wouldn’t want to run your Blast jobs on a 32- or 64-CPU mainframe — that’s a total waste of money — if that’s in demand for other useful, large, shared-memory applications. Rather, offload your applications that fit nicely on commodity, cheap equipment; do that as much as possible so you can extend the life of these big investment machines.
Morrissey: Look at the calculations and the computations — is it integer-based or floating-point intensive — and then build the architecture appropriate to that. A lot of integer-based calculations are probably much less expensive to run and acquire but what we see oftentimes is that people will spend enormous amounts of resource for large computational systems and then apply them [in ways that] if they had done it differently, you could see significant cost savings.
Wiseman: The adage that you get what you pay for is really true. Locus got really burned because they wanted the biggest cluster in the world, and that was the most important driving force. [That] wasn’t totally stupid, but we can’t afford the biggest cluster if it’s a really good cluster, so you get a really cheap big cluster.
Really think about what you need. And then don’t worry about the stuff that’s not important. Don’t worry about investing years and years trying to figure out how to buy one computer when it’s not that important. Put your time and your energy where it’s really important.
Athanas: Quality investment in hardware in critical places is important, but the actual computation — the workhorses of a computing infrastructure — should be like light bulbs. Identify fault tolerance within your system, software and hardware, and that will empower you to take advantage of whatever’s the latest and greatest today. We’ve had clients [say], ‘We want 600 nodes.’ And we went through the analysis and in the end we only deployed 60. It suited them perfectly well and we also gave them a plan for how to grow.
Trunnell: It’s OK to ask for advice during this preplanning. One of the challenges in life sciences is that there’s a really strong ‘not invented here’ attitude. It’s not just software, it’s also IT infrastructures. And most times, the problem’s already been solved.
Genome Technology: What about breakthrough technologies coming along in the next few years?
Athanas: Perhaps some of the data-integration technologies that are coming up, in terms of software abstraction. I think there’s a chance that the payoff of that could be very substantial — talking about ontologies — if some of these technologies are more and more adopted.