OAKLAND, Calif.--BioInform recently spoke with John Couch, the former Hewlett-Packard and Apple Computer executive recently named CEO of bioinformatics software provider Pangea Systems here. Prior to his appointment as CEO, Couch had been working with Pangea for a few months as an executive-at-large for Mayfield Group, the venture capital firm that led the company's financing. In this conclusion of the two-part interview, Couch talks about his definition of bioinformatics and his plans for Pangea immediately and in the future.
BioInform: What are your top priorities at Pangea for the next six months?
Couch: First I took a look at the industry and said, what are the common problems? What's the situation now? I think everybody agrees there's a tremendous amount of data. We've got chart after chart that shows the exponential growth; they all look like hockey sticks--protein data, sequence data. At one time the big challenge was a gene target. Now it's more gene validation. There are many different types of data--sequence data, expressed data, protein data, small-molecule chemical data, etc. And then there are certain islands of data, even within pharmaceutical companies. Because of the traditional, functional way in which they were organized, the data exist in different groups and don't reside in a common data warehouse, so you don't get the benefits of scale that you would if those data were integrated.
I also have looked at some of the software, and most of it is academic; there's very little commercial-grade software. Those of us who have run very large software organizations have a lot of people in the areas of quality assurance and testing, and I don't see a lot of that. Again, we're in that early adopter phase where the customer does the testing. And a lot of software is just developed in incompatible environments. Some is developed in LISP, other software is developed in Visual Basic, other software is developed in C++. To me that's a problem, and it's not unlike where the PC world was back in '77 and '78.
With that in mind, I tried to define what I felt bioinformatics really was, and I came up with a maturity matrix. The first layer is the actual data generation; someone has to set up the process to generate unique experimental data. From there, there's data capture. Someone has to capture those experimental data in a standard, open-architecture repository so that multiple avenues of access can get to it. The third area, data filtering, is where the data need to be cleaned up in order to produce more accurate information. From the filtering I then go into data analysis, where you analyze experimental data and compare them with other published data.
The fifth component, underlying everything here, is data integration, where you must integrate the many disparate data sources into a single environment. Once you have that then you can start data mining, which is the extraction of pertinent knowledge out of these massive data sets. Then finally, you need to model and simulate. That's where you place these data into a biological context that allows for visualization in a natural biological state. To me it's like the graphical user interface. Back in the early Apple days they said look, this is crazy, we're asking people to memorize slashes and commas in these textual commands, why don't we just emulate the way they act at an office, and that's why we created garbage cans and file folders--icons from the office. Well the same thing has to be done given the biological objects people are working with.
So I define data generation, data capture, data filtering, data analysis, data integration, data mining, and data modeling and simulation as bioinformatics. With that in mind, our focus at Pangea is really in three areas. Our business strategy is, first and foremost to create a platform, an open architectural system that allows for storing these different sets of data. We've spent quite a bit of money, and we probably have well over 40 computer scientists working in this particular area to create a platform, an open system, that allows for the storing, access, and retrieval of all of these types of data.
In addition to that, we have had to write applications for the filtering and analysis of these data. No company is going to be able to write every application required for this industry, but we felt that we had to write a number of applications to verify the robustness, the openness, the flexibility of that platform, so we have one app that's called GeneWorld that does the filtering and analysis of data that integrates into the platform's data warehousing capabilities. The other area is actually getting additional, novel content into that warehouse, so we're forming relationships with certain data generators, labs that generate unique types of data. In addition, Pangea has novel data sets of our own that we call IC2 data, integrated computational content, which is where we take raw data and through scientific analysis mine those data and classify them so there's more value.
Visualize a wheel. The hub of the wheel is the platform, which is a hierarchical platform with the data repository being the base and common element. And around the hub, the business strategy is threefold. One, we must form strategic alliances with people who have unique data, so that the data can be stored in that open architecture. We will also add our own unique content to that architecture. The second component of the strategy is in the application area. We will develop a few unique applications that prove the robustness of our platform, then we will work with other application companies to ensure that the data that are created out of their applications can be stored and retrieved within a common platform, and also with the scientific and academic community such that we can start to integrate their algorithms and data within the standard platform as well.
The third category, which is really important, is systems integration. That's a set of tools that we've written that allows for the migration of a company's proprietary data into this data warehouse, and a set of tools that allows for the wrappering of a company's proprietary scientific analysis algorithms to run on top of this platform, so they can mine the data. For example, we have two products that are going to test sites in November. One is called GeneWorld, which is filtering and analysis software. It integrates into a second product that is really a content product in that we've taken five of the very large public-domain databases and brought them all in-house, normalized all the data, cleaned up the annotations such that if you do a search or analysis, you now do that across all five public databases simultaneously. And the systems integration tools will allow a pharmaceutical company or a biotech company to migrate some of their own proprietary data into the same reservoir that all of this normalized public data is now sitting in, so their analysis software can cut across all the different data sets. They can do a query and the results are hitting multiple databases.
So the business strategy is the platform, the content, the relationships with the third-party software houses, and then a systems integration group. I've formed an organization that works with companies to migrate their data and their unique proprietary analysis into this environment. This is not unique for the computer science industry, but it may be unique for the new, emerging bioinformatics industry: a common platform, relationships with third-party developers and content providers, and an organization that can help customers migrate their proprietary data into a data warehousing environment such that there is one repository for a company's data as well as the public data, the enterprise server solution that can then be served up to a thin-client desktop.
BioInform: You have two new products going into testing?
Couch: They're going into beta testing in November. And then we are working with a few early adopters, strategic partners, in defining what the computer science industry calls API's--application programming interfaces. In other words, it's the gateway into the platform so that they can add their own unique data into this repository and they can integrate their own unique analysis strategies into GeneWorld.
BioInform: Any other new products in the pipeline?
Couch: In the spring we will launch the platform itself as a product such that you can create your own Gene Thesauruses, which is our content introduction, and create your own GeneWorld analysis-type of routines. The platform is crucial. When I look at this industry I see an awful lot of proprietary environments where the company says, you buy our data and then we're the sole source of analysis software. You're locked in. But in my conversations with the pharmaceutical companies and a lot of the biotech companies, they really don't want to be locked up with any one company. They want a set of tools, they definitely want a repository for their data, but then they want a set of tools that allows them to access those data, to store unique data into it, to mine those data, and to visualize those data in many different ways, whether it be through an evolutionary tree or a reporting format.
Pangea's goal is to provide a set of very low-cost tools, viewers, algorithms, datasets, and different types of tools that really allow a company to move faster. We're probably going to spend up to $20 million building this open architecture. Most companies that I've talked to, the software has existed to support the scientists and they're starting to realize that the cost of not only developing that software, but testing and maintaining it, is high. They're starting to say, what business are we really in, the science and drug discovery business or the software business? So they're looking for some industrial-strength commercial tools, a back-end system, an infrastructure that allows them to write their unique, proprietary analysis software that's required by their scientists, without having to solve all the world's data warehousing problems.
I started working with Pangea in June as a consultant out of Mayfield, and my first hire was James Dai, who was in charge of worldwide software development for Informix, and before that he was with Oracle. That gives you an idea of the type of management team that we're bringing in. I have another hire that will start in November who comes with 17 years of experience in sales and marketing in the enterprise server area. So I'm pretty excited about the opportunity to create a new market and stake a flag out there that says Pangea is the leader of this market.
Looking at our products, we have two applications and an integrated set of data that we've completed. One application is called GeneMill that we have not released to the world, but we have some strategic partners that are using it. It takes care of capturing data at the data generation level. GeneWorld handles the filtering and analysis and the platform itself handles the integration of all the data, and then we have a number of viewers that are part of that platform that allow the modeling and simulation and some of the mining of those data. So I feel like we have a strategy and a software set of offerings that bridge that ladder from data generation to data modeling and simulation.
BioInform: Five years from now what would you like to have under your belt at Pangea?
Couch: Five years from now? They used to ask us that at Apple and we'd say, the personal computer will be a consumer product and we'll be a billion-dollar company, and they laughed and asked what we were smoking. My vision really centers around the word integration. If we are going to benefit society in terms of creating unique drugs that are effective because they map the genetic makeup of an individual, then we are going to have to supply a software environment. We have an internal code name for it, Pantegra, which brings in the idea of sort of an integrated world. That's got to be in place. If people continue to operate in islands of isolation, this industry isn't going to move at the pace that it ought to.
My vision is really to have a common, open architectural environment that supports many data sets, many types of applications, visualizations of those data, such that all of these data that are being generated, all of this knowledge that's being created, can be accessed very quickly and inexpensively, and it's not the domain that's owned by a few, but it really empowers the scientists. I'm starting to sound a little like Apple in the early days, where our goal was to empower the individual, empower the knowledge worker--and I think that's really what we want to do here, empower the scientists, allow the scientists to have at their fingertips, via the thin-client desktop, access to integrated computational content, and to the latest algorithms coming out of the universities, and to many ways of visualizing those data.
We've moved from the challenge of generating targets to where we have to validate those targets. Most of the pharmaceutical companies have way too many targets. They've got to narrow that down and take it to the next step. So that's our vision, that we would be successful in supplying an environment that speeds up the process of understanding the genetic makeup and the appropriate drugs that can transform those mutated genes back into healthy ones.
We used to say at Apple, the reward's in the journey, and we've certainly got a journey here. The excitement really is in the relationships that we make along the journey. We're excited to be in an industry that is just being defined. Bioinformatics is a new field, it's going to grow. It's going to be an exciting time.