Last month, the NIH awarded a three-year, $15 million grant to unite Swiss-Prot, Trembl, and the Protein Information Resource (PIR) into a single resource called United Protein Database or UniProt.
When UniProt is ready, the data will remain freely available to academic researchers, but it is not yet clear whether companies, which have to pay for access to Swiss-Prot until the end of 2004, will obtain free access afterwards.
Now, commercial users of Swiss-Prot need to obtain a license from GeneBio, a Geneva-based company that has an exclusive distribution agreement with the Swiss Institute for Bioinformatics. The agreement runs out at the end of 2004. After that, everything is open, according to Nasri Nahas, GeneBio’s COO.
One scenario, Nahas believes, is that the current licensing model will continue. “It’s a model that has proven to be efficient,” he said. “Maybe the NIH will ask us, with our extensive database of customers and our experience, to distribute UniProt.” At the moment, annual licensing fees range between $5,000 for a single user to $100,000 for unlimited usage at a company’s site, according to Nahas. Licensees include all the major pharmaceutical and life science companies.
But Rolf Apweiler, who has led the Swiss-Prot group at the European Bioinformatics Institute since 1994, and who will head the integration effort, said in an interview with ProteoMonitor’s sister publication BioInform that he would like to see licensing fees removed in the end. However, no funding is currently available to replace industry fees, he said. According to Apweiler, more than 50 public servers are currently providing access to the data, and 4,000-5,000 companies have in-house local copies.
Swiss-Prot, a hand-curated protein sequence database founded in 1986, currently contains 114,000 entries, while Trembl, a computer-annotated supplement to Swiss-Prot, holds entries to 700,000 proteins. Both databases are maintained by the EBI and the SIB. PIR, a joint effort between Georgetown University Medical Center and the National Biomedical Research Foundation, contains 283,000 entries, which are organized in protein families and super-families according to their sequence similarity. By the end of the three-year grant, UniProt is expected to hold more than two million entries.
The new database will consist of two parts: the Swiss-Prot section, which will contain fully annotated entries, and the Trembl section for computer-annotated records that have yet to be hand-curated. The PIR group will assist in this annotation process; all of the PIR entries will be integrated into UniProt. Apweiler will be assisted by co-investigators Amos Bairoch, a group leader at the SIB and founder of Swiss-Prot, and Cathy Wu, who currently oversees PIR.