HILTON HEAD, SC--At last month's Genome Sequencing and Analysis Conference here, BioInform sat down for a conversation with Kwang-I Yu, president, CEO, and a founder of Paracel, the Pasadena, Calif., based bioinformatics hardware provider. A former fellow at TRW, from which Paracel was spun off in 1992, Yu founded and directed Coyoteworks, TRW's center for high-speed computing systems. He holds a PhD in computer science from Cal Tech. Paracel was originally founded to commercialize advanced information filtering technology. At the conference the company unveiled its latest product, GeneMatcher, a very high-speed computer optimized for genomic data analysis.
BioInform: What are some of the most interesting things you've seen here at the conference?
Yu: The sheer breadth of activities. There are a lot of people in this field, a lot more than a year ago, and a lot more people with very interesting product ideas. Most of those ideas are still in the prototype stage. There aren't as many completed solutions, but it will be really interesting to see, between this year and next, how many of these ideas have become solid products. Various people in the bioinformatics business have aspects of their software that I think are very interesting. Lots of people say they're going to provide a whole breadth of products, but what you see is some specific pieces that give you hints, and some of those are really nice. Certainly the sequencer people are showing some interesting things here.
BioInform: How do you want to position Paracel among bioinformatics companies?
Yu: It's not going to be just one or two companies that end up providing monolithic products for the whole industry; a number of players will come up to meet a vast array of needs that are all happening at the same time. Every company that's going to succeed will have some area of specialty, and ours is in data analysis. We hope to change the paradigm of how people do data analysis.
Up to now what's happened is that you typically used heuristic algorithms. They're getting better all the time, but there is a distinct distance between what you can achieve in terms of sensitivity and selectivity with heuristic algorithms and with the more optimal dynamic programming algorithms, such as Smith-Waterman and Hidden Markov Models. As with the beginning of any computing field, people chose to conserve computing power over precision and recall, selectivity and sensitivity. You also did not do many kinds of things, such as very large-scale cross-referencing of all databases.
All these algorithms pick up the same 80 percent. The sensitive ones pick up another 10-15 percent, and that's where you're different from your competitors. When people really need to search with the sensitive algorithms, they expect to sit for hours waiting for the results. What you want is to be able to press a button; not just choose whether to use optimal or heuristic algorithms, but to know whether Smith-Waterman or hidden Markov is better in this or that condition. What we think you want to do is take all these algorithms and run them all in parallel and get the results back in interactive time. You also want to be able to take all these algorithms, each of which has slight differences in sensitivity, and be able to continuously cross-check databases, to see what new things have popped up.
If you're able to run really, really fast--over 1,000 times faster than a big, general-purpose machine--then it's not an issue of whether it's a little bit faster or not, it's an issue that you will, on the data analysis side, do things that are different. I don't choose among algorithms, I run them all. I don't run a heuristic algorithm just because that's the only thing I can afford; I run them all. I don't have a search strategy that's very complicated, such as where you take a database and try to cut it down, then you take what's left and try to cut it down again, and 17 steps later, then you have a search. What we want to do is brute force. It's easier to solve the problem of having enough computing cycles than to solve the problem of how to make these solutions palatable to biologists. The biology is harder than the computer science, from our perspective. Therefore, why don't we put enough horsepower into the machines and optimize around the biologists, not around the computers.
So that's our niche. It's not the only area of data analysis we'll be in, but we want to be the very best in data analysis because we come from a slightly different perspective on how to view this problem. I hope that our user interfaces are simpler and the data overload is less because we have 1,000 times more computing power. Somebody yesterday said a very illuminating thing. He said, our problem today is we have too many targets to look at and not enough biologists. Why would I need more computing power, which will get me more targets?
Our response is that we've been in the business of information filtering for a long time. When you have lots and lots of computing power, the fact is that you have fewer and better targets to look at, not more targets. The whole idea of computer power in data filtering is that you get out of data overload. So we hope to begin with this specialty and then find and build our areas of strength in a field where I think there are going to be several good suppliers with different strengths. We have a rather open view of the world we work with. There are many different people, but we don't view it as primarily a competitive world, but more as one where everybody's scrambling so that our customers have adequate solutions quickly enough. And if that's so, then it's going to work out fine for Paracel.
BioInform: What's your current mix between hardware and software efforts, and how do you see that evolving?
Yu: If you look at our engineers, approximately 25 percent are in hardware and 75 percent in software. That's always the case, even if you just build software to control hardware. It's still a significant software effort and clearly we and everybody else need to provide more complete solutions--not the whole solution, but more parts of it. So we will be building data analysis applications as well. The three-to-one, I think that's going to stay the way it is. If anything, the software side and the biology side will increase in size.
BioInform: A lot of companies are entering this market. What's it going to take to come out on top in bioinformatics?
Yu: Really first-rate engineering. The biotech and pharmaceutical companies are unlike, say, universities and research houses; they're going to have low tolerance for nonindustrial-strength products. So companies must not only come out with the most compelling products, but with the ones that integrate the best into the overall environment, that are scalable to the size that the customers will need, and that are very reliable. That will be critical.
We've had quite a bit of experience doing that. We've provided systems for the US government for other kinds of information processing, but there were very similar kinds of solution methods, of approaches. We have a system, for example, that has, by now, over 2 trillion bytes online, that has a couple of thousand users, and something like half of them at the same time. We filter 100,000 complex profiles at the same time, continuously; several gigabytes of live new data a day cramming into this thing. That system runs 24 hours a day, seven days a week. As you can imagine, with 1,000 analysts on it, if it breaks down for two hours, we're going to hear about it.
So we've had quite a lot of experience building that. We've also built commercial systems of that scale. We had a new but flourishing internet information services business that aggregated about 3,000 publications, newspapers, journals, etc., and supplied them on a subscription basis to major corporations--Cisco, Motorola, Microsoft, Hewlett-Packard. We divested that business to concentrate on bioinformatics, but that system has over 100,000 Fortune 500 corporate subscribers over the web, with the most reliable data center in the business. Again, with 100,000 subscribers, if the system goes down, we lose business. When companies pay hundreds of thousands of dollars to buy our service, they expect a certain level of reliability. The heart of that system is also relational databases.
So we have some competencies in very large systems, very reliable systems, in database management. We have, certainly, a lot of experience with the web, which is the medium through which the scientists will work. Now that we're focused on bioinformatics, we'd like to bring some of these skills into this field. We like to think of ourselves as being very good at large-scale, reliable systems. It's not an issue of here's a really interesting box. We have a track record building the scale of systems that we think pharmaceuticals are going to need, that we're not sure how many of the competitors actually have, and we have a very sizable and experienced engineering staff.
BioInform: Collaborations seem to be the name of the game in bioinformatics. What's your strategy for choosing collaborators and for maximizing the value of those partnerships?
Yu: We're going to pick a relatively small number of collaborators from among our customers, from other people in the industry, and from universities. But with each of those we intend to have fairly deep relationships, so that we have considerable resources to invest into these collaborations. We've always been financially fairly strong; we have an ongoing business with a substantial backlog and we sold the internet business for a lot of money, so we plan to invest fairly heavily. Each collaboration is one that we treat seriously and put manpower into. Some of them are just convenient, for example, Cal Tech, they're right next door to us. We recruit from them. That's a natural place for us to have a collaboration with a university.
BioInform: Speaking of recruiting, how is Paracel addressing the challenge of finding good people in this market?
Yu: I think we're pretty attractive. As with everybody else, recruiting is a big issue for us, but we've had reasonable luck. We have a breadth of technologies that are interesting to people who are really good scientists and engineers. It's an attractive thing to say I'm a biologist, I'm interested in software. And lo and behold, here's a company that will give me 1,000 times the computer bandwidth, so I can really try some new things in software without worrying about the bottleneck. That combination is interesting to a lot of people.
We've also not necessarily concentrated just on people who are in the bioinformatics area. We recruit the best athletes, so to speak. You want to get the best computer people, who can build reliable software. You want to get good biologists, with a good view into the genomics field. You certainly want people with prior experience in bioinformatics, but this field is so new that if you've got the best people, we feel that's a good way to go, as opposed to exclusively wanting to hire people with a bioinformatics background. It brings in diversity, and that's what we have, good diversity. We want bioinformatics people and we also want other people with biology or computer science skills, hardware and software skills, which are hard enough to find by themselves.
BioInform: How important are object-oriented technologies to Paracel's products?
Yu: I think the object-oriented view of the world is pretty important, because it's a fairly complex view, so therefore you want to be able to carve the world out into objects and have people be able to view and handle it that way. As far as actual implementation, we would use object-oriented approaches for things like a client-user interface, but when you talk about really large-scale systems, we use straight ANSI C cross-compiled in several platforms, because, first of all, the efficiency of systems is very important and you can easily lose an order of magnitude of performance by not worrying about the engineering aspects of things. So our major systems, the bulk of our software, is straight ANSI C, but in the user interface and the front ends things like Java and object-oriented programming tools come into play, where portability is important and the user's viewpoint is important, but performance is less of an issue. I think that's pretty close to the main industry approach; Oracle's and Microsoft's industrial-strength stuff is built in C.
BioInform: What are the most important bioinformatics trends that you see today?
Yu: First is clearly the shift from what was primarily a very successful global scientific research effort into exploitation. Within 10 years, I think, all new drug design is going to be genomics-related. Here is a $300 billion pharmaceutical industry, and a very fast-growing biotech industry to support it, that needs to shift in mid-gear. You have a $300 billion industry spending $30-40 billion a year in R&D: huge efforts, no tools. There are very good sequencers around, good instruments; there isn't very much software to support it. So everybody's scrambling.
When you build a "bioinformatics group" in a pharmaceutical company, most of those guys are just keeping the system alive, doing manual translations of, say, Incyte's protocol vs. GenBank's protocol. There isn't a whole lot of energy left over to talk and improve their lives; they're barely keeping this stuff bandaided together. Lots and lots of industrial-strength systems need to be built.
That's very different from a university arena, where you have a few very smart people with great tolerance for public domain software, who are going to be willing to do a lot of computer-sciencey things just to get something working. This is industrial production, large-scale sequencing, manipulation of databases, analysis. Another trend that's related is automation. The most expensive commodity is scientists, and right now they're operating at a fraction of full efficiency because there aren't enough tools. That's why we and other people are entering the bioinformatics field. Those are some of the obvious trends.
It's really interesting because I came from the aerospace industry, where 30 years ago, there was a period that was much like genomics today. All of a sudden huge numbers of scientific breakthroughs were being applied to the aerospace industry, and they were pressing the state of the art in multiple areas, for example to build the moon landers. That's a nontrivial endeavor with the technology they had 30 years ago. One thing that happened in those days was that there was an unprecedented collaboration among industry, the government, and academia. A lot of that clearly got lost around the Vietnam war and so forth, where industry shot off; Microsoft doesn't talk to the government or academia.
But in the bioinformatics field I see something very similar to the aerospace industry in its glory days. There are lots of things being invented. You go look around this conference, there are a bunch of guys from academia hanging around with a bunch of guys from industry, like us--we build systems, right?--hanging around with a lot of customers, major pharmaceuticals and biotechs who are going to need to shift gears in midstream and turn around in very, very fast order a whole new generation of technology. It's really a great place to be.
BioInform: Where would you like Paracel to be five years from now?
Yu: We would like to be one of the leaders in bioinformatics. We would like to have contributed to this shift where people are not so much worrying about computing cycles, or worrying about the efficiency of the scientists, which means that, I hope, in five years the bioinformatics field is a lot more automated than it is today, with industrial-strength tools. And I hope that by that time it will be clearly recognized that Paracel was a key contributor.