NEW YORK--Forecasting strong double-digit growth in the bioinformatics market for at least the next several years, major bioinformatics hardware suppliers are significantly ramping up efforts to improve their understanding of the field and work more closely with customers to address the challenges of bioinformatics. To get a better understanding of the hardware vendors' perspectives on the market, BioInform talked at length to representatives of several major companies that sell bioinformatics researchers everything from mainframes to desktop systems.
Too Much Data
The top user concern cited by virtually all the vendors is managing an overwhelming amount of data that seems to be growing exponentially. "A lot of these databases are getting so large now it's tough to keep them in sync," commented Gerald McAndrew, Sun Microsystems' worldwide group manager for the chemical and pharmaceutical industries. "It takes a long time to synchronize these large sequence databases. Some of the big databases like GenBank were taking days to synchronize."
"The issue is data size," agreed David Valenta, scientific research program manager for Hewlett-Packard. "That's the biggest problem they always bring up: take care of my database problem." Sharon Nunes, research manager in IBM's computational biology division, concurred. "Data management is one of the key challenges," she told BioInform. "How do you manage the massive amounts of information? One of our customers is adding literally hundreds of megabytes a week to their storage system and I know they're not alone in this arena. How do you find what you need to find in all the data that are coming online, both within one company and into the public domain? How do you share that information appropriately, and how do you get the key pieces of information to the desktops of the people who need it to make decisions?"
"One thing we find is that as soon as people have a new machine, they fill it up, storage, memory, everything. It's like, okay, I need a new one," Nunes continued. "Several of our customers have said the same thing: I give these bioinformaticists a new machine and tomorrow they come back and say they need a new one. From a systems standpoint it's a phenomenal challenge to integrate all this information and to manage it."
"There's a proliferation of this gene data becoming available and all the companies are looking for ways to sequence faster," observed Larry Greene, Digital Equipment's pharmaceutical industry director in the US. Meanwhile, Lionel Binns, Digital's industry director for chemicals and pharmaceuticals in Europe, singled out the storage issue. "There's so much data," he said. "Two or three companies at least are holding terabytes of genetic information in storage. And it will only get worse because the amount of data is going to get bigger and bigger. Throughput becomes the limiting factor. There's too much data for humans to deal with in a capable way. The focus over the next couple of years will be on data: help us sort out this mess."
"If you look at where we stand relative to the Human Genome Project, we're barely scratching the surface," McAndrew noted. "I hear numbers thrown around that maybe 2-4 percent of the genome has been sequenced. We have a long way to go to the year 2005. And if you look at the data curves and you look at some of the storage requirements, it's an extremely interesting market."
Problems with the volume of data are only compounded by the fact that much of the information is in incompatible formats, making data integration another key issue for end users.
"Integration is probably the number one challenge to the industry," claimed Juli Nash, biology market manager for Silicon Graph ics. "They're managing a lot of databases that come from a lot of different resources and have a lot of intranet/internet connectivity issues that they have to sort out."
"Another trend that they're all wrestling with is, I've got all these different types of data in disparate databases. How do I integrate it, pull the data into the system supporting it to try and identify new drug targets faster?" Greene added. "This is an enormous challenge. One CIO described it as a mess." Valenta agreed, "There are three or four different kinds of databases that function differently that are not at all related to each other." And McAndrew remarked, "A lot of the companies are looking at trying to integrate a lot of their scientific knowledge and databases for both chemical and biological areas."
Many users are taking a brute force approach to the problems, trying to tackle them with raw computing power. "Technology gives them a competitive edge," Binns observed. "The increase in power and in affordable storage and decrease in cost of workstations. Those three things are a major factor. They need a lot of power to perform the processing involved. They need affordable high-performance computing. Take away the technology and you've got nothing left but science with good ideas." His colleague Greene concurred, "Any thing that can help optimize their environment so they can sequence more of these genes faster, they're all looking forward to that. Chip speed is important, internal memory is important, and the last thing is the speed of the bus."
"In the national labs and particularly in Asia, we're seeing a centralization of resources that creates big compute requirements, the biggest database environments you can imagine," Nash stated. "Their needs seem endless. So the compute requirements are big servers, big scalable environments, scalable to the largest systems available for Unix. We also see those installations needing very serious horsepower for analyzing the structure of proteins, and they're using Cray architectures for those analyses. So we see, at the highest end, very high pressure on the biggest systems that Silicon Graphics and Cray can provide."
"One of the reasons this market has been a tremendous one for Silicon Graphics is that it is insatiable in its ability to use compute cycles, to demand scalable environments, and not just environments that scale to, say, 20 or 30 processors, but environments that scale to 128 processors and get linear application performance," she concluded.
McAndrew commented, "During the past year we have enjoyed a lot of success on the high end of our SPARC server offering. We saw a trend definitely moving toward the Enterprise 10000, as well as the other high-end platform, which is the Enterprise 6000. That seemed to be a lot of the sales that we started to experience later in the year. Earlier it was more 3000, 4000 servers. I definitely saw a trend this past calendar year toward higher-end servers. If you look at the work being done right now, it's just more and more compute cycles and master cycles."
Role of Specialty Providers Unclear
The role of specialty bioinformatics hardware providers in the market is unclear, according to the mainstream equipment providers. McAndrew, for one, claimed, "I think there's a role for the specialty processing units, like the Compugen and Paracel. There are a few specialty boxes out there. Whether that is a trend, I don't know; I didn't personally see a lot of competition from specialty boxes."
However, Valenta disparaged the bioinformatics-only equipment. "I'm seeing that the mainstream companies are not interested in what I call point solutions," he contended. "They're interested in buying computing power that can do their gene sequencing or gene search, but on off-hours can also run the payroll. I'm definitely seeing that, especially where I'm talking to pharmaceutical companies. Now if you talk to a researcher at a smaller company, they would buy the point solution, probably. And after about six months to a year they're going to find out that it's not the right way to go. I'm finding that servers that are multipurpose servers are a much better value proposition. The bioinformatics-specific equipment is too narrow, too inflexible, definitely."
Flexibility does seem to be valued by customers, the vendors reported. "They're now depending on bigger servers that are more multifunctional that can support not only database activities, but also high-performance algorithms, data mining, and still deliver to a broad diversity of clients the visualization and the desktop environment that a biologist can work with," Nash said.
McAndrew noted a similar trend. "I've seen dedicated systems at the low end, but when you get into the large systems we're seeing other work being integrated on the platform," he told BioInform. "It's usually complementary, though, because it's all in the scientific area."
Perhaps related to the flexibility issue is the growing interest seen in platform-independent Java and CORBA technologies. McAndrew, whose company developed Java, observed, "We saw a major increase in Java and CORBA in the industry. I'm surprised, Java is really strong in bioinformatics." Nash also noted how "the bioinformatics community is moving forward with the application of CORBA technology, trying to build standards through the Object Management Group Life Sciences Group."
Coming in the next issue of BioInform: the conclusion of our exclusive two-part look at hardware vendors' perspectives on the bioinformatics market. How fast is the market growing? Who makes the ultimate purchasing decisions, bioinformaticists or corporate IT staff? What new products will vendors offer the bioinformatics market this year? All that and more on February 2.