The increasing availability of free public genomic data has long shed doubt on the wisdom of the genomic-database-subscription business model. But a new company says proteomic data is a different story.
To great fanfare last week, Confirmant, the joint venture launched in June 2001 by Oxford GlycoSciences and the telecommunications firm Marconi, introduced a prototype of its “experimentally validated” human proteomic database at Cambridge Healthtech Institute’s Genome Tri-Conference in Santa Clara, Calif. Its mission: “To be the leading provider of bioinformation, offering customers a protein-centric view of the genome and a deeper understanding of disease.”
Company officials told ProteoMonitor that their so-called Protein Atlas, which will eventually contain data for every protein expressed in as many different tissues as possible, is commercially viable because its content simply won’t be eroded by public-sector efforts. They said that an approach such as theirs — one that systematically derives proteins from tissue samples and then maps them back to exons on the Golden Path version of the human genome using peptide sequence tags, or PSTs — would be too complex, costly, and massive an undertaking for any public-sector effort. “High-throughput expertise is required for this and we’re four years ahead of any others who would jump in,” said Confirmant’s CTO Jonathan Sheldon.
What about HUPO’s aims? “Those guys are the first to admit that it’s only this year that they’re even going to define what a public domain effort looks like,” said Sheldon. And as for any existing protein databases, he added: “Our information is experimentally derived, so the value is far greater than computationally derived data.” Only about 20 percent of annotated genes in the public sector databases have all the exons predicted exactly, he noted, and the public data contain numerous pseudogenes — non-protein-coding regions that have been misidentified as genes.
Confirmant’s Oxford, UK-based managers (who apparently named their product with no knowledge of the 30-year-old, first-ever macromolecular sequence repository, the Atlas of Protein Sequence and Structure that was published by the US National Biomedical Research Foundation) add that no one in the private sector has attempted such a project on the scale they have.
Global Samples, Co-localized Fragments
Confirmant’s major expense, said Sheldon, is biosourcing. The company purchases diseased and normal blood, tissue, and cell-line samples from a variety of sources worldwide and uses any of several technologies — 1D and 2D gels, fractionation, ICAT reagents — and Oxford GlycoSciences’ tandem mass spectrometry method for characterizing proteins. The data are generated by OGS’s wet labs and turned over to Confirmant carte blanche for incorporation into the database.
Explained Sheldon: “We’re not just trying to isolate the protein once. We’re trying to isolate it in a variety of cell lines and tissues and body fluids. That will get you an angle on the different splice variations you see in tissues.”
Alternative splicing, he said, is important especially for temporal and tissue-specific expression. “Current wisdom says there are between five and 10 different splice forms per protein, depending on the gene you’re looking at … and these different protein isoforms and splice variants have been shown to be associated with different diseases.”
For instance, Sheldon cited data that Confirmant has derived from cerebrospinal fluid: The company found 300 genes that encoded greater than 1,500 different protein isoforms — a mix of splice variants, SNPs, and different posttranslational modifications. “To give that a disease relevance,” Sheldon said, “in the CSF we found over 28 different isoforms, and if you look at the levels of that particular protein in a normal and a schizophrenic sample you find only one of those actually varies quantitatively in schizophrenia.”
In its current state, Sheldon said the Protein Atlas contains data for 7,000 protein-coding genes from any of numerous tissue types and disease states. “You name it, we’ve got it: hearts, livers ... cancer, Alzheimer’, all kinds of diseases,” he said.
By its official commercial debut (planned for Cambridge Healthtech’s Beyond Genome conference in June) the Protein Atlas will contain information for 10,000 genes, he said.
Sheldon, a former Roche bioinformatics director, said the goal is to catalogue protein data for 30,000 genes by June 2003 and to expand the database over time by adding information on genes linked to time progression of disease states and perhaps eventually protein-protein interaction data.
Confirmant plans to sell royalty-free three-year subscriptions to pharmaceutical and large biotech companies. Sheldon, who wouldn’t comment on pricing but did not dispute previous reports that annual subscriptions would be priced at £2 million, or roughly $2.9 million, said there would be “a variety of offerings, priced accordingly.” For instance, customers might choose to purchase data for just one disease area, he said.
The 50-50 joint venture received initial cash funding of £30 million last June and currently employs 20 full-time and contract staff. Terms of the agreement call for Confirmant to pay Oxford GlycoSciences £5 million for exclusive marketing rights to intellectual property on selected proteome databases and £1.5 million to license data-analysis software. Additionally, Marconi will invest £10 million in Oxford GlycoSciences through the purchase of company shares.