Thure Etzold, managing director of Lion Bioscience, Cambridge, UK
Thure Etzold is living proof that the field of bioinformatics marches to the beat of an over-caffeinated drummer. The inventor of SRS, the data integration system marketed by Lion Bioscience, Etzold speaks about the technology with the air of a veteran: “SRS is so old,” he says of the software he first released in 1990. “But it’s been constantly moving,” he adds, lest anyone think SRS runs on punchcards.
Just over a decade ago, he offers as a reminder, before “the Internet happened,” biologists looking for ways to combine data from different resources had few, if any, options. A wetlab scientist at the Max Planck Institute for Plant Breeding Research in Cologne, Germany, studying chloroplast transformation, Etzold finally got fed up hopping between the EMBL, SwissProt, and PIR databases and began “playing around with search tools” to find a better way to link them together with a unified querying system. The result, the first version of the Sequence Retrieval System, only linked three databases, but when he presented the idea at three different institutes at the time, it was revolutionary enough to net him three separate job offers.
Deciding to bag his molecular biology PhD in favor of a new focus on the still-embryonic field of bioinformatics at EMBL, Etzold said he returned to Max Planck after his triumphant tour only to be confronted with a classic bioinformatics monkey wrench: The three databases had completely changed their file structures, format, and content, making his hot new system essentially obsolete. Not one to give up, Etzold said he “decided to start over and do it smarter.”
That’s when the “real idea” behind the technology was born, he said: to create a system that described the structure, format, and syntax of the underlying databases as much as possible so that when the databases were updated, SRS could change in lockstep. Metadata-based linking would allow extensive database cross-referencing to enable querying across any number of resources. Again, Etzold points out, this was long (in bioinformatics time) before metadata, XML, and ontologies were in common use. Soon, other databases, such as ProSite and the PDB, joined, and the idea of “a universe of databases” linked with explicit cross-references was within reach. Once the first web browser interface was added in 1993, “it really took off,” Etzold said.
But the system’s popularity had its price. The size of the project and its mounting user base soon made SRS unsustainable as an academic effort. EMBL, which was seeking to spin off some of its promising technologies at the time, approached Etzold about commercializing SRS, but he was initially reluctant to leap into the for-profit world. However, a compromise was soon reached that allowed him to move to Lion Bioscience in 1998 with his SRS team, while retaining his position as group leader at EMBL/EBI. SRS would remain free to academics –– a sticking point for Etzold.
The result, he said, is “access to the best of both worlds” – for Etzold as well as SRS users. The continued involvement of academic groups has allowed the system to grow to support over 600 data sources, but that growth would not have been possible at all if the technology had not been commercialized, he noted. “When a software system grows, there comes a time when it’s falling apart,” he said. “You need to invest a considerable amount in software testing, development, new release planning – you have to keep so many things under control.” The time and effort required to keep the system up to date and well documented would not have been available had the technology remained within EMBL, Etzold said. Grants for maintaining existing software systems simply aren’t very easy to win.
The latest release of the system, SRS 7 Evolution, can handle the familiar flat files of the usual bioinformatics database suspects, but adds support for XML and relational database systems, as well as Perl and Java APIs and full 3D functionality. The wealth of technology options available to bioinformaticists today has only made the need for a system like SRS greater, said Etzold: “People store their data with XML or in Oracle, but they’re still not making use of it,” he said. Citing Corba as an example of a technology that “people thought would solve everything, but [that] created more problems than solutions,” Etzold warned that XML and relational database systems also bear their share of problems for users. A fan of the flexibility provided by old-school flat files, Etzold is wary of abandoning technology that works in favor of adopting “the next wave.” However, he noted, the company’s work with Incyte using XML has given him some new ideas on how to make the system run “blazingly fast.” And in bioinformatics time, that could mean light speed.