Oracle has named its new database software “10g” to highlight its compatibility with grid computing, and abandoned the “i” suffix that characterized its product releases during the Internet boom. But the company claims that there’s much more to 10g than the grid, especially for its life science customers.
The 10g database, launched at last week’s OracleWorld meeting in San Francisco, is the first release to contain features specifically targeted at the life science market — a customer base that Oracle began courting heavily in the second half of 2001. The company set up a life science business unit and formed a customer advisory board two years ago to zero in on the field’s database requirements. Feedback from that initiative is now beginning to appear in its products.
As proof of its commitment to the bioinformatics community, Oracle has added a version of Blast to its suite of data mining capabilities, along with support vector machines and non-negative matrix factorization — algorithms gaining popularity in the gene expression analysis community. In addition to the new algorithms, the 10g release also offers features for distributing, storing, and transporting very large biological datasets as well as several new capabil- ities for manipulating and analyzing scientific data.
“We think as data volumes grow it makes a lot more sense to move the algorithms to the data rather than to have to move the data around all the time,” said Susie Stephens, senior product manager for Oracle’s life sciences group. The company said these features will save time, but it certainly can’t hurt to give customers some additional incentive to keep their data in one place. As competition in the life science database market heats up, Oracle is betting that if customers can manipulate and analyze their data within Oracle, they’ll choose to stay there.
Maintaining Market Share
In 2001, Oracle held 90 percent of the life science database market without even trying. When the company took a sudden interest in the sector, observers at the time perceived it as a response to IBM’s no-holds-barred entry in the market, a strategy that Oracle didn’t deny. “Obviously it’s a threat,” Charles Berger, Oracle’s senior director of product management for life sciences, said at the time [BioInform 12-17-01]. “It’s IBM and they’re spending a lot of marketing dollars and they’re making some noise.”
Since then, IBM has chipped away at Oracle’s market share by including its DB2 database as part of broader hardware and services deals with life science customers. The competitors have also spent the past two years honing very different messages for the life science database market: While Oracle has encouraged users to migrate all of their data onto its platform, IBM has stressed a federated approach that relies on its DiscoveryLink middleware to integrate data from a number of heterogeneous platforms, including Oracle.
Jeff Jones, director of strategy for IBM Data Management Solutions, told BioInform that despite 10g’s grid computing claims, “the grid as defined by Oracle appears to be … a homogeneous network of Oracle databases. That isn’t the academic definition of grid, nor is it the view of grid we’ve taken …’Grid’ to us means larger than what we’ve been hearing from our competitors, and it means diverse.”
This, Jones said, “is at the heart of the difference between Oracle and IBM in the database space.” While IBM has opted for an environment where “heterogeneity is expected and embraced,” Oracle seems to be sticking with its “very centralized, all-Oracle, gotta-have-it-in-a-central-place- before-we-can-really-do-all-of-our-good-stuff-for-you strategy,” he said.
The pros and cons of these different options are certainly debatable, but if Oracle can convince its large customer base to stay faithful, its strategy could pay off in the end.
What’s in 10g?
Indeed, the 10g database release reinforces Oracle’s vision of a homogeneous environment by making it easier for customers to do as much work as possible without ever leaving its database. “We want to make sure that people don’t have to move their data out of the database a lot or move to a separate application just to do a small amount of analysis,” Stephens said.
In addition to Blast and the other data mining algorithms, Stephens said the 10g release also allows users to create regular expression searches to manipulate or find data strings — a task that “people are currently using Perl to do,” Stephens said. Previously, “they’d have to take the data out of the database, manipulate it with Perl, and then put the results back in. Now they don’t have to move the data out of the database at all.”
The new release also includes a feature called “transportable tablespaces” to quickly move very large datasets between different versions of Oracle on different operating systems, as well as a “network data model” capability that allows users to model relational data in a graph form using nodes and edges.
Most of the new features have been extensively beta-tested by life science customers, Stephens said. For example, the University of Kyoto has put part of the KEGG pathway database into the network data model format, and the University of California, San Diego, is using the network data model feature to map protein-protein interaction data. Myriad Proteomics is testing Blast and regular expression searching, and the Wellcome Trust Sanger Institute is also trying out a number of 10g’s new features.
Martin Widlake from the databases services group at the Sanger Institute told BioInform that the transportable tablespace option in 10g should prove valuable for a terabyte-scale database the institute is building to share raw trace files for genome sequence data. “To give that information to people at the moment, we have to put it onto tape and send the tapes to them,” he said.
The Blast and regular expression capabilities are also useful, Widlake said, but he noted that they are not suited for power users: “If you wanted to do serious amounts of processing, you’d still take the data out of the database, but a lot of the time, that’s not what you want.”
And 10g’s namesake, its grid computing capability, is also promising, according to Widlake. “We’re very excited with the idea of having the Oracle system look after the splitting of the data over whatever disks you make available to it,” he said.
The timeline for a complete rollout of 10g at the Sanger Institute is still uncertain, but Widlake said that several “major systems” would run on the new version of the database by next year.