SAN LEANDRO, Calif.--BioInform recently spoke with Dennis Smith, senior vice-president and chief scientist of bioinformatics software provider MDL Information Systems here, about the company's strategic directions and trends he has observed in the bioinformatics market. Smith has been with MDL, where he now heads the company's efforts in bioinformatics and information integration, for the past 10 years. He previously worked at Intelligenetics, where he directed bioinformatics research and served as associate principal investigator for the BIONET National Computer Resource, and with Lederle Laboratories.
BioInform:, What types of challenges are life sciences companies facing today?
Smith: When I talk about life sciences I'm including the pharmaceutical industry, agchem industry, and biotechnology companies, and even some chemical companies, interestingly enough, because some of them are beginning to look at genomics as a way of, for example, studying microorganisms that cause paint fouling on ship hulls and looking for effective treatments that are environmentally friendly and deal directly with the organism.
At any rate, one of the major problems facing these industries is the relatively low productivity of their research effort. If you look at the number of new chemical entities approved in this country by the FDA over the past several years, the trend has been downward, despite the fact that those years have included significant increases in R&D budgets. Although the trend has gone up a little bit lately, it's still not a situation that management of these companies has felt is acceptable, and if you set the objective of needing to run a growing business on three to four new chemical entities of significant importance each year, the productivity is significantly below that, which some people have given as a reason for the mergers and acquisitions that have taken place recently.
The industry response has been quite interesting. They've chosen to try to add a great deal of automation to produce high-throughput methods in some of the basic re search areas, looking for an industrial process approach to some of the research activities, something that's reproducible, that can be run again and again, and run at high throughput. This does not guarantee a predictable process in the sense that new chemical entities will automatically result, but at least it automates what had been very time-consuming activities that people had been involved in before.
The counterpart to this objective is to free human intellectual resources to try to make sense of the data these systems produce, rather than actually generating data. So in genomics we have high-throughput sequencing. In biology there are high-throughput screening systems, using a lot of robot technology. In chemistry there's the area of high-throughput combinatorial chemistry or automated synthesis. All of these processes are generating a great deal of additional data and are subject to a lot of automation, some of which has already taken place.
Once the systems are in place, the question becomes how to get people involved in the process of transforming the data into information they can use to make better decisions. So in each area we see a strong focus on informatics--bioinformatics for the genomics data, various screening database management systems to manage the assay data out of biology, and various chemical database systems to manage the combinatorial chemistry and automated synthesis flow of information. Those are areas MDL either has been in or, as with the area of genomics, has begun to invest in, in terms of offering products. We want to attack the problem of allowing people to look at information across those disciplines in a more integrated fashion. That's really our business.
BioInform:, Specifically in the area of bioinformatics, where has MDL targeted its product development efforts?
Smith: The capture and management of the basic flow of sequence data, up through automated analysis, continuing to interactive analysis of the results. We do not provide systems that work directly with the instrumentation, in terms of collecting the basic sequence data from the sequencers and doing the initial processing of those data. Rather, we pick up the flow of data downstream of that. We define the area of the capture of at least the partially processed data from the sequencers, the organization of that combination of public and private data, and then making that available for a variety of automated and end user analysis processes.
BioInform:, How does MDL fit into the picture with other software companies in the bioinformatics market?
Smith: We're going about this problem with a larger objective in mind, something I think we are uniquely qualified to do. We are viewing activities in the area of genomics as very important in and of themselves, and looking at ways to provide bioinformatics solutions to support those efforts. Everything we're doing is in the context of where that information will be used throughout the research process, so rather than viewing it as an activity that is an island unto itself, we're looking at it as something that has to work in conjunction with the biological assay systems and chemical database systems in order to provide the kind of decision support environment that customers are telling us they want. And that's an area that, because of our experience and customer base, we can really address, whereas many other companies can do a good job in perhaps one area or another, but being able to meet customers' overall information needs is a real challenge, and I think we have the opportunity to do that.
BioInform:, What important trends are you seeing emerge in bioinformatics software?
Smith: Clearly the focus is on the proteins that actually do the work. Having the DNA sequence databases and analysis tools in place is a critical foundation, but in the biological systems along the metabolic pathways the proteins are doing the work, so clearly the focus is strongly on protein structure-function relationships. Sometimes this is called functional genomics, an umbrella that includes DNA and RNA; other people refer to it as proteomics, obviously with a strong focus on just the proteins. That's clearly a major area of interest today.
A related area is differential expression, either of RNA's or of proteins. That's a way to measure more directly the differences between a system in a normal state and one that is challenged. The challenge could be an environmental threat, a carcinogen, or a disease state, whether or not it's caused by some genomic difference. Differential expression has received a lot of attention, a lot of new and exciting technology is coming on line. It's an area of rapid development; there's no clear, truly superior technology, so I think people will be trying a number of different ones to try to get a handle on these key pieces of information.
And that ties directly back to the proteomics or functional genomics. Once one has an idea of differences in systems, then one needs to be able to answer the question, what are the differences at the protein or DNA level, and what are the potential structural and biochemical pathway implications of those differences? Obviously that's getting really close to making a decision about where to intervene in that biochemical process, and produce a specific treatment.
BioInform:, How important is the issue of platform independence in bioinformatics software today?
Smith: Very important. There are two components; one is the end-user interface. People haven't all chosen to buy the same thing; some have PC's, Macintoshes, Suns, SGI's, and so forth. For scientists who have to make use of these systems, truly web-based technologies that offer a substantial degree of platform independence are important, and that's what we have chosen as the delivery mechanism for most all of the analysis tools we provide, simply because it's much easier from the customer's point of view, and also from the developer's point of view.
BioInform:, Is MDL involved in the effort to create a CORBA-based interface definition language standard?
Smith: Yes, we've been involved in the meetings and will continue to expand our participation. This initiative helps to define how different systems talk to each other, largely in the background. Although it's a true peer-to-peer definition, most people will be using it as a client-server definition, so it's really an enabling technology to help various database systems on different platforms to communicate.
On the server side there's obviously the same degree of heterogeneity--many different hardware platforms, many different database systems--but no dominant technology has emerged that you need to work with and develop for, as HTML has been for the web-based interfaces. So that's why there are initiatives like various middleware systems--our ISIS/Host being one--with various new standards such as CORBA coming into play, because it's a tough problem. They're going to continue to proliferate; no one's going to want to even think about controlling it, so the question is how to keep it together.
The one thing people need to keep in mind is that CORBA only solves part of the problem. It's a communication standard that establishes standard objects and some standard methods that can communicate with one another over platforms. It doesn't solve some of the basic database integration issues, such as how to do simultaneous searches over several databases with different platforms, different database systems, and get a set of results back to the end user in a reasonable length of time. There's a lot more work to be done beyond CORBA, but that's a very interesting initiative.
Coming in the next issue of BioInform: part two of our exclusive interview with Dennis Smith.