BEVERLY, Mass.--Proteome, a provider of curated protein databases for the pharmaceutical industry, has launched its third product, WormPD, a library of data from the model organism C. elegans. Like its collections of yeast proteome and Candida albicans proteome information that have already won the four-year-old company more than 20 customers, the worm proteome database contains genomic sequence data, results from gene expression studies, and a comprehensive review of relevant scientific literature. The three interconnecting volumes form what Proteome said is its BioKnowledge Library, "designed to help researchers rapidly locate specific information about gene expression, to better understand biological pathways, and to identify potential pharmaceutical targets."
"Knowledge management" is how Henry Oettinger, marketing manager, described Proteome's mandate. Unlike most other database vendors, Proteome is not in the business of generating data. Instead, it specializes in packaging what already exists.
A staff of PhDs with job titles such as Worm Curator pore over scientific journals and electronic repositories, building products aimed at "reducing the complexity of the information landscape," according to the company mission statement. Established in 1995 by James Garrels, former director of the Quest Protein Database Center in Cold Spring Harbor, NY, and Joan Brooks, who was a senior scientist at New England BioLabs here, the privately held company has licensed its databases to most major pharmaceutical companies, including Astra, Bristol-Myers Squibb, DuPont Pharmaceuticals, Glaxo Wellcome, Hoechst Marion Roussel, Merck, Monsanto, Pfizer, SmithKline Beecham, and Zeneca.
Explaining the company's success, Oettinger contended that while sequence information is no longer creating a bottleneck in drug discovery, getting good functional annotation for the genes is. "Information by itself without biological context can be meaningless," Oettinger said. "One experiment for one type of condition using a particular genome could generate 50,000 datapoints. Datapoints are going to add up to the tens of millions, or even billions, and all that information is coming to pharmaceutical researchers." He continued, "They might be able to isolate 100 sequences from a chip-reading experiment, but they still have to take that information, go to a library, and try to understand the function of each of those 100 genes." Proteome's databases offer researchers a way to obtain gene function information with one click from one location, he said.
In part, the value in Proteome's products lies in their ability to help customers reduce false readings and avoid what Oettinger called annotation catastrophe: "An error that's generated early on in a piece of data can get used again and again if it's done automatically or by a robot. That doesn't happen with the Proteome databases, because human beings, who are all PhD-level scientists, curate them."
The company's relationship with the academic community also serves as a quality control measure, Oettinger noted. Because Proteome's databases are available online at no cost to academic users, many contribute suggestions, corrections, and updates. "Somebody might say, Oh, I've got a better reference for that, and send it in so our editors can get a more applicable piece of information about the gene or protein," he explained.
Regular updates ensure that commercial customers get access to newly uncovered data within a week. Added Oettinger, "It means that if an experiment is done in Europe and reported in a European journal, researchers in US pharmaceutical companies don't have to go to their library and read every journal for the previous week to learn about that one experiment. They can go to their protein report page and see the annotation that's been added."
At its customers' requests, Proteome is working now on developing a software tool that would enable users to load proprietary genomic information and generate internal protein report pages for their own data. The company's software engineers are also at work on a tool that would allow users to search results of a transcript-profiling assay. "The tool itself would be part of the functional genomic analysis, such that researchers would put in results and get back from our database a list of title lines and properties for that gene. They could learn something of the function very quickly and decide whether they want to keep that or throw that gene out as a potential target for their research," said Oettinger.
To date, Proteome's own computer engineers have developed all of the systems, protocols, and methodologies that enable its curators to collect and present proteomic data in its databases. "We haven't relied on any outside sources for code writing or data collection," Oettinger said, adding, "The way we produce a high quality database is our trade secret."