UniProt Goes Live
UniProt, the comprehensive protein database formed by combining the resources of Swiss-Prot, Trembl, and the Protein Information Resource, marks its official launch on Dec. 15.
The $15 million, three-year project, first announced in late 2002 [BioInform 10-28-02], is a collaboration between the European Bioinformatics Institute, the Swiss Institute of Bioinformatics (SIB), and Georgetown University.
Describing the structure of the database as a “wedding cake,” EBI’s Rolf Apweiler, UniProt’s principal investigator, said that “each tier of the cake represents a different database, optimized for different uses.”
At the base of the project is the UniProt Archive (UniParc) — a comprehensive non-redundant protein sequence database that is updated every day with protein sequences from the public databases —Swiss-Prot, Trembl and PIR as well as EMBL/DDBJ/GenBank, Ensembl, the International Protein Index, the Protein Data Bank, RefSeq, model organism databases such as FlyBase and WormBase, and the European, US, and Japanese patent offices. UniParc provides cross-references to the source databases, sequence versions, and status.
The next layer of the structure is the UniProt Knowledgebase, which brings all the data from UniParc together for each protein. Researchers will be able to submit protein sequences directly to the Knowledgebase using a new web-based submission tool called SPIN, which replaces Swiss-Prot’s e-mail-based submission system.
The top tier of the system, UniRef, comprises three sub-layers: UniRef100, UniRef90, and UniRef50, which combine closely related sequences into a single record. UniRef100 is a non-redundant version of all the sequences in the Knowledgebase, UniRef90 collapses all the sequences that are 90 percent or more identical into a single record, and UniRef50 collapses sequences that are at least 50 percent identical.
UniProt is available at http://www.uniprot.org, and the individual members of the UniProt consortium still have their own web pages at http://www.ebi.uniprot.org, http://expasy.uniprot.org, and http://www.pir.uniprot.org.
Chimp Genome Sequence Released along with Human Alignment
The National Human Genome Research Institute last week announced that a first draft of the genome sequence of the chimpanzee (Pan troglodytes) is now available in the public domain, along with its alignment with the human genome.
NHGRI funded teams at the Broad Institute of the Massachusetts Institute of Technology and the Genome Sequencing Center at Washington University School of Medicine in St. Louis to sequence and assemble the genome.
The initial assembly, based on four-fold sequence coverage and assembled with the Arachne program, was deposited in GenBank, and is also available via EMBL and DDBJ.
The assembly — and alignments with the human genome — are available via the University of California, Santa Cruz, Genome Browser; NCBI’s Map Viewer; and EBI/Sanger’s Ensembl system.
The UCSC Genome Browser team said that its alignments were generated using the blastz program developed at Pennsylvania State University and the programs Blat, axtChain, chainNet, and netSyntenic developed at UCSC by Jim Kent.
The Ensembl team said it expects to have a fully annotated chimp assembly in early 2004. In the meantime, it is providing initial gene structures determined by matching Swiss-Prot and RefSeq to the chimpanzee genome.
Entelos Expands J&J Research Collaboration
Entelos last week said that it will expand its research collaboration with Johnson & Johnson Pharmaceutical Research to include target validation, lead optimization, and clinical development in the field of obesity.
J&JPRD sister company McNeil Nutritionals, a division of McNeil-PPC, will also join the collaboration with Entelos.
Entelos and J&J began their collaboration on diabetes research using the Entelos Metabolism PhysioLab platform in June of 2002.
Cellomics Integrates its Storage Solution with EMC Centera
Cellomics last week said that it has integrated its Cellomics Store database for high-content screening data with EMC's Centera Compliance Edition content-addressed storage solution, which is optimized for fixed content.
The integrated storage solution provides software and hardware for the management, storage, analysis, and archiving of cellular image data, Cellomics said. Users will be able to use the system to create, manage, and analyze high-content screening experiments, cellular image data, and discover relationships for potential targets, lead compounds, and genes.
Elsevier to Shut Down BioMedNet, other Science Portals
Scientific journal publisher Elsevier last week said that it plans to pull the plug on BioMedNet and its other scientific portals in order to cut costs.
“During our six-year association with virtual community portals in the science and technology arena, Elsevier has tried a number of different business models in an attempt to make these portals self-sustaining, with only limited success,” Elsevier spokeswoman Marike Westra told BioInform via e-mail. “Having carefully reviewed the options available to us, we have decided that future marketing investments will be made in other areas and that the investments in the science and technology portals BioMedNet, Chemweb and ElsevierEngineering.com will be withdrawn.”
BioMedNet provides access to Elsevier's online journals, as well as research tools such as a database of phenotypic and genotypic information on mouse knockouts, a pharmacological targets database, and Medline. “We are currently evaluating how to integrate our essential services and products that are hosted on our portals within alternative solutions,” Westra said, adding that the portals “will continue to operate as usual until the integration work is completed.”
NSF to Award $14.5M for Integrated Informatics
According to a recent program announcement, the National Science Foundation plans to award around $14.5 million in grants for its Science and Engineering Information Integration and Informatics (SEIII) program in 2004.
The program supports IT development for a range of science and engineering domains, including biology. The goal is to focus IT research on “addressing problems that will enable scientific discovery via analysis of large data sets or information resources.”
SEIII encompasses two related components: Science and Engineering Informatics (SEI) and Information Integration (II). A “special emphasis” will be placed on “domain-specific and general-purpose tools for integrating information from disparate sources,” according to the program announcement.
NSF plans to award between 25 and 30 grants under the program. The proposal deadline is March 4, 2004.
More information is available at http://www.nsf.gov/pubs/2004/nsf04528/nsf04528.htm.
Integrated Genomics Shares Pathway Data with Ariadne
Ariadne Genomics has entered a strategic partnership with microbial genomics firm Integrated Genomics to add pathway information from Integrated Genomics’ ERGO collection of curated microbial and eukaryotic genomes to its PathwayAssist desktop software for pathway visualization and analysis.
PathwayAssist is sold as a part of the IobionLab suite of biological software tools and distributed by Stratagene.
Biobase to Distribute Molecular Connections' NetPro
Biobase of Wolfenbuttel, Germany, said last week that it is has begun distributing NetPro, a protein-interaction database created by Molecular Connections of Bangalore, India, on a non-exclusive basis.
NetPro contains over 30,000 annotated protein-protein interactions based on data extracted from scientific literature, according to the companies.