Last June, IBM and MDS Proteomics announced with fanfare their support of a database to hold information on all sorts of protein interactions — with other proteins, RNA, DNA, even small molecules. BIND, as the database is known, “is poised to become the definitive source of biomolecular interaction data,” MDS and IBM said at the time in a statement.
Seven months later, however, BIND has yet to acquire the resources to scale up its operations, and consists overwhelmingly of yeast protein interaction data taken from yeast two-hybrid studies. The database has grown from about 5,800 yeast protein interactions to slightly less than 7,000 since June, but “we’re still waiting for the big money” to start tackling human protein interactions, said Francis Ouellette, a bioinformaticist at the University of British Columbia who also serves as BIND’s administrator.
That money has so far not been forthcoming. MDS and IBM each pledged $4.5 million last June, but Ouellette has yet to spend the money because he’s waiting see if BIND will win a $19 million grant from Genome Canada, a Canadian government initiative that requires a certain fraction of matching funds from industry. Ouellette is also in discussions with an unnamed source for additional funds “on about the same scale” as the $19 million government grant, he said.
But the critical question facing the BIND administrators, even if they do receive the money they’re requesting, is how to beat out the 40 to 50 other protein interaction databases to become the de facto standard. Not only is this in the best interest of Ouellette and his colleagues, but also for IBM and MDS Proteomics, whose ambitions for marketing MDS’ proprietary database — also based on the BIND model — rest on the scientific community’s acceptance of BIND.
“In about six to eight weeks you should hear about [the Genome Canada grant], and there could be other things that lead to major funding over the next few months,” Ouellette said. “We’re working very hard at it.”
In the meantime, Ouellette and the other principal investigators on the project — Chris Hogue, a bioinformaticist at the University of Toronto, and Tony Pawson, a cell signaling researcher also at the University of Toronto — are devoting a portion of their laboratory resources and manpower towards filling up the database with easily accessible records such as those compiled in PDB, the database of protein structural information administered by Rutgers University, the San Diego Supercomputer Center, and the National Institute of Standards and Technology.
Assuming BIND gains access to funds, Ouellette said the first step is to go back and validate much of the yeast two-hybrid data — notorious for false positives — that now populates the database. Building a critical mass of protein interactions in a range of organisms and pathways is also a key for appealing to scientists, Ouellette said, because currently researchers have no use for a database that doesn’t include the particular pathway they study.
“[When] we have enough different types of organisms and pathways, not necessarily so that everyone’s genes are in there, but enough related families, I think people will recognize it as a standard,” Ouellette said.
Partnering with scientific journals presents another avenue for gaining wide acceptance. BIND has an agreement with the journal Genome Research to require all authors to submit their interaction data to BIND upon publication, and the National Center for Biotechnology Information (NCBI) has expressed interest in integrating BIND with its Entrez portal. “That would definitely bring people to the trough to have a drink,” he said.
On the other hand, Ouellette has yet to convince Nature and the High Wire Press journals, which include Science, to commit to a similar agreement. The holdup, said Ouellette, is the cost for both parties to keep up with the flow of new data.
Costs have restricted other activities as well, including BIND’s ability to develop software allowing direct submission of data, software for visualizing protein interactions, and the hiring of curators for processing and validating the information. With enough money, only scientists familiar with a particular organism or biological pathway would oversee how that data is represented in the database, he said. “But in some cases, the researcher might have passed on” or be otherwise unreachable, making less qualified curators the only option, Ouellette said.
But Ouellette doesn’t claim responsibility for all the work to build a respectable database. Part of that lies with the community of protein researchers at-large, whom he encourages to make constructive criticism and alert the database administrators when they find mistakes. BIND has even published the inner workings of its database model, he said, providing researchers with further food for thought.
“It’s important to have people go and look at it and let us know what they think,” he said. “A database is only as good as the people [who use it].”