The National Center for Genome Resources in Santa Fe, New Mexico, is enlisting commercial partners in two consortia to populate its metabolic pathways database, PathDB, with plant, mouse, and human data. Licensing arrangements for PathDB and its associated software toolkit are expected to be in place by the first quarter of 2001.
Jeffrey Blanchard, project manager for PathDB, said that about four companies are expected to sign up for the plant consortium by January 15, each committing to contribute $175,000 per year for two years toward the database population effort. Putting together the human and mouse consortium, which he expects to include about five firms, will take a bit longer, he said, and he has set a closing date of March 15, with a commitment of $400,000 per year for three years. He declined to name the companies involved.
Consortium members will be licensed to port the database to their own servers. “A lot of them want the database behind a firewall,” Blanchard said, citing concerns about query records being collected and perhaps sold to competitors. “We don’t do that,” he added, “but some of the companies have policies about outside queries.”
Separate licensing arrangements will be available for companies who do not belong to the consortium. “But they’ll probably be pretty expensive,” Blanchard cautioned, “because we’re primarily concerned with getting groups into the consortium and getting the database populated, and we don’t want to undercut that.”
PathDB resides at NCGR on a Sun server running Sybase, but will be ported to Oracle to accommodate users, Blanchard said. Its continually updated contents currently focus on Arabidopsis data, along with a small quantity of data on bacteria. Populating the database is a labor-intensive process, and has so far consisted mainly of mining Arabidopsis genome annotations and the primary literature.
The database holds information on chemical compounds, reactions and transport steps, proteins, and the sequence of steps that make up each metabolic pathway. It is organized in terms of taxonomy; each pathway and protein is labeled with the taxonomic classes in which it occurs. PathDB currently includes approximately 140 “classical” pathways, such as glycolysis, 1500 transport steps, and 1000 proteins.
A recently released second beta version of the Java client software, which includes query and visualization tools, may be downloaded from the NCGR website at http://www.ncgr.org. Microsoft Windows, Apple MacOS, and Unix versions are available. Enhancements since the initial beta include a new discovery tool to find new pathways in user-defined metabolic networks.
A typical query might be to ask for all steps that appear in the leaf of a particular plant species but not in the root, or that appear at one stage of development but not another. With the discovery tool, instead of being limited to known pathways that the curators have included in the database, users can work with any set of connected reactions, or ask for all pathways connecting two compounds.
The official software release is planned for January 13, in conjunction with the Plant & Animal Genome IX Conference in San Diego. The January release will include essentially the same functionality as the current beta. Developers are now concentrating on quality assurance and making the software faster. Blanchard said that further enhancements over the next 18 months will focus on the discovery tool, the least mature component of the system.
The database will remain publicly available on the NCGR website, Blanchard said. A simple Web query capability is available, and an SQL interface is also envisioned for the future. The Java tools are likely to remain freely downloadable as well, he added, and will in any case remain free to academic and non-profit users. Paid licensing arrangements will be available for companies interested in modifying the code and incorporating it in their own products.
“There’s a lot of data that’s going to be out there and a lot of tools for working with it,” Blanchard said. “We’ll make the most progress by working together.”
—Sherri Chasin Calvo