Having secured a substantial chunk of funding from the Canadian government, the Biomolecular Interaction Network Database is finally ready to scale up its operations, hoping to become the GenBank for molecular interactions of proteins, nucleic acids, and small molecules.
In April, Genome Canada, which administers funds from the Canadian federal government, decided to award the project a CA $12.5 ($8.2) million grant, adding to a CA $5.2 ($3.4) million contribution from the Ontario Research and Development Challenge Fund. This government funding matches CA $4.5 ($2.9) million each from MDS Proteomics and IBM, which the two companies pledged a year ago.
“It has taken us about a year to raise the required funding to take the database out of being run by my graduate students…and to put it into a professionally staffed organization that has the time and the bandwidth to deal with the increasing amount of interaction data in the world,” said Chris Hogue, principal investigator of the BIND project and a senior scientist at Mount Sinai Hospital’s Samuel Lunenfeld Research Institute in Toronto. Part of the credit goes to Blueprint, a non-profit organization set up between IBM, MDS Proteomics, and the SLRI for fundraising purposes.
So far, the database has found a home at the laboratory of Francis Ouellette, a bioinformaticist at the University of British Columbia in Vancouver, but Hogue said it will soon move into its own premises in Toronto, probably by the end of the summer, and set up hardware equipment provided by IBM. By the end of the year, he is hoping to hire a staff of about 25, mostly software developers and database curators.
Their main task will be to fill the database, which so far contains only about 7,000 interaction, complex, and pathway entries — more than half from yeast — with data from the literature. “We have a lot of catching up to do,” said Hogue. For the first year or so, they will concentrate on adding yeast interactions, an estimated total of 25,000. Part of the funding from ORDCF is specifically dedicated to this project, Hogue said. The next goal will be to include interactions from human disease gene products, as well as a set of interactions Hogue and his colleagues recently extracted in an automated fashion from the Molecular Modeling Database, which contains experimentally determined biopolymer structures from the Protein Data Bank. To enter BIND, these interactions will first have to be turned into a high-quality reference set by curators, Hogue said.
Software development will be another focus of the new team, leading to new tools for data indexing and curation, importing large datasets, improving the BIND user interface, and visualizing interaction networks, biochemical, and signaling pathways.
But apart from catching up with the literature, Hogue will also encourage scientists to submit their data to BIND. Eventually, he would like to see journals requiring that authors put their interactions into his database, but so far, only Genome Research has signed on to this. “It will take us a year before we are able to operate like EMBL or GenBank or DDBJ where we can take a lot of data submissions and turn them quickly into records,” Hogue cautioned. “I have to have enough people available as curators before I can go and ask journals and funding agencies to require interactions [to be submitted to BIND].”
Making the database comprehensive is surely an important task, but in the end, a database is only worth as much as the quality of its entries. Many scientists believe that a large number of the interactions reported in the literature are not real — just last month, an analysis paper in Nature estimated that “more than half of all current high-throughput data are spurious.” Hogue said he is aware of this problem, and would like to collaborate with researchers who could develop suitable filters and verify interactions by other experimental techniques. “But as a primary archive, you want to store all the raw information and then apply filtering on top of that,” he said.