Current BioData, a joint venture between a bioinformatics company and a scientific publishing consortium, is hoping to pioneer a new model for delivering biomedical content with its first product, a curated database of druggable protein targets that is set to launch in the first quarter of 2007.
The company was formed in March 2005 as a joint venture between Geneva Bioinformatics and the Science Navigation Group (formerly Current Science Group) — the parent company of BioMed Central and several other independent scientific publishing companies.
Current BioData’s vision for its first product, dubbed the Targeted Proteins Database, is in step with an emerging trend in the bioinformatics and publishing communities to blur the line that has traditionally separated databases from journals.
The rise of electronic and open access publishing has caused many journals to take on certain qualities traditionally associated with databases, while the proliferation of biological data is driving demand for carefully curated databases that help researchers prioritize new information.
TPdb is built upon a curated database that GeneBio originally began developing in 2001 under the name of ProXenter, which was focused on protein families of particular interest to the pharmaceutical industry. The database was built upon the information in SwissProt, which GeneBio marketed before the resource reverted to a publicly funded model (and was renamed UniProt) in 2003. After several years of developing the database under a traditional bioinformatics and curation model, “we realized that this was pure editorial work that we were doing,” Nasri Nahas, CEO of GeneBio, told BioInform.
As a result, the company sought out partners in the publishing field, and found a kindred spirit in Vitek Tracz, founder and chairman of the Science Navigation Group. Tracz “very quickly understood the concept, and this was 100-percent in line with what he thinks about database publishing and the future. So we were all on the same page,” Nahas said.
This week, Current BioData announced that it had hired five new senior staff members to prepare for the impending launch of TPdb. The new hires reflect the company’s hybrid pedigree: Ian Tarr, formerly executive vice president at Thomson Scientific, has joined as CEO; Stephanie Kappus, an alum from GeneBio and the European Bioinformatics Institute, serves as chief technology officer; Rebecca Lawrence, formerly publishing manager for the Drug Discovery Today series at Elsevier, has joined as director of editorial Services; Laura Thomson, from the British Standards Institute and Thomson, is director of business development; and James Jacketti, formerly US sales manager for Thomson’s Current Drugs, has been appointed senior vice president of sales and marketing.
Tarr noted that GeneBio’s intention to “add layers of evaluation and commentary” to the core information in SwissProt “is not really a bioinformatic/proteomic type skill. It’s more of a publishing skill in terms of commissioning authors, commissioning experts to comment and put context around information, and so on.”
TPdb “does not just give you the SwissProt entry” for proteins, Tarr said. “We’ve specially commissioned a review of the literature that’s written for a drug company researcher that says why [each protein] is particularly interesting as a potential target for a drug.”
When the database launches in the first quarter of 2007, it will include information on around four or five protein families, or modules, Tarr said. Over the course of 2007, the company will build that up to 20 or 30 modules. “The ones we choose will depend on what the customers select as the most interesting ones,” he said.
Current BioData’s curation staff is currently in the neighborhood of “a couple dozen,” Tarr said, and the company will add more employees over the course of 2007 as it moves into new protein areas.
The database will be updated daily, Tarr said. GeneBio has provided some text-mining technology to assist the Current BioData editors in quickly accessing information related to a particular topic, but Tarr noted that the bulk of the curation and annotation will be carried out manually. “Text mining is not perfect,” he said.
For each protein area, Current BioData has created an advisory board of around 50 or 60 scientists called a “virtual faculty” that Current BioData editors can use as a sounding board for assessing the importance of new information. The idea is to “put a sort of quality filter” on information in the database, Tarr said, noting that a number of potential TPdb subscribers have requested that feature.
“If you just set up a PubMed search to track a particular protein, you get so much information that you want to have some sort of quality filter on it so say, ‘Why is it interesting, what is it associated with, is it something new, or is it just a reaffirmation of known findings?” he said.
In addition, Current BioData plans to create a family of open-access peer-reviewed journals focused on the protein families in the TPdb. These journals, to be published under the name of CBD Research Journals, will be hosted by BioMed Central.
“One thing we want to do with papers that have been published in these journals is to provide mechanisms for our authors to create lots of rich linking into the database and to deposit results,” Tarr said. “We’ll structure the paper in a way so that, in effect, it becomes like a mini database entry.”
This capability would align with other emerging efforts that are at the intersection of journal publishing and bioinformatics databases. At a “New Frontiers” session dedicated to scientific publishing at the Intelligent Systems for Molecular Biology conference this August, Phil Bourne of the Department of Pharmacology at the University of California, San Diego, pointed out that new technologies like the semantic web and the rise of open access publishing are enabling journal articles to be marked up with metadata at the time of publication so that they can be dynamically linked to other resources in the field [BioInform 08-11-06
“We realized that this was pure editorial work that we were doing.”
He noted, however, that “the challenge is capturing the concepts that need this linking at the time the paper is written.”
A research group at the EBI is already working on linking the two types of content. In a collaboration with UK’s PubMed Central announced in August, EBI researchers are developing automated ways to hyperlink all molecular entities mentioned in the PubMed Central archive to records in public data resources [BioInform 08-04-06
In a separate interview at ISMB, Amos Bairoch, head of the SwissProt group at the Swiss Institute of Bioinformatics, told BioInform
that “there will be a lot of work to be done in collaboration between databases and journals,” and raised the question, “Are journals going to become themselves database providers?” [BioInform 08-11-06
Bairoch, who is also head of Current BioData’s Scientific Advisory Board, said at the time that the field was likely to move toward a mixed model. “You’re going to have journals that are going to do this, you’re going to have journals that are going to collaborate with public databases, and you’re going to have databases that are going to become publishers and compete with editors,” he said.
As for GeneBio, it’s out of the database business. Nahas said that the former SwissProt steward has turned its efforts completely to software development after launching the Current BioData joint venture.
Nahas said that GeneBio holds a minority stake in Current BioData, but did not provide further financial information.
GeneBio is developing several new proteomic analysis packages to supplement its current Melanie 2D gel analysis software and its Phenyx mass spec peptide fingerprinting package. Nahas said that that the new products will likely launch in the first half of 2007, but did not provide further details.