As part of its recently announced participation in the Gene Ontology Consortium, Compugen has become the first commercial entity to donate annotated genomic data to the project.
The company donated the non-proprietary portion of its Gencarta genome, transcriptome, and proteome database to the effort. Gencarta contains hundreds of thousands of annotated transcripts and their predicted proteins for humans as well as other organisms. It is based on the application of Compugen’s LEADS technology, including the modeling of alternative splicing for the analysis of genomic, expressed and protein data. “We are donating the part that contains the original mRNA and the original proteins from the public databases,” said Liat Mintz, director of genomic research at Compugen.
Mintz said that Compugen has been using GO terms to annotate Gencarta for almost a year.
The annotated data are included in the GO gene association files, adding 114,183 new gene products associated with one of the three organizing principles in the ontology: molecular function, biological process, or cellular component. The GO gene association files currently contain information on Drosophila melanogaster, Saccharomyces cerevisiae, Mus musculus, Arabidopsis thaliana, Schizosaccharomyces pombe, and Caenorhabditis elegans, for a combined total of 24,682 associated gene products.
Compugen’s data marks the first inclusion of human genomic data in the files, although Mintz noted that some terms used for mouse are the same as those used for human.
Inclusion in the gene association files represents only the first step in Compugen’s involvement in the GO effort, Mintz said. The annotated data serves as a means by which to suggest changes, additions, and corrections of terms and definitions used in the ontology. Adoption of GO terms based on the annotated data is an ongoing process.
“We’re working with them together to have more genes attached and marked with GO terms to help genes get into standardized annotation,” Mintz said.
Michael Cherry, associate professor in the department of genetics at Stanford University School of Medicine and head of the project, said, “This significant contribution will be integrated with the vast and growing body of data from functional analysis of genes, helping us in our efforts to develop one consistent language to describe and mine genomic and gene expression data.”
In addition to Compugen, AstraZeneca, Incyte Genomics, the National Human Genome Research Institute, and the Medical Research Council, London, support the efforts of the GO Consortium.