Win Hide and colleagues from the South African National Bioinformatics Institute will be among the participants in this week’s MGED3 meeting at Stanford University, where they hope to gain international acceptance for their controlled vocabulary for expression state information.
While the Gene Ontology Consortium is developing ontologies for molecular function, biological process, and cellular components and the EBI’s Microarray Gene Expression Database group is building an ontology for microarray experiments, the SANBI researchers felt there was a need for a standardized vocabulary for all forms of expression information, including ESTs and SAGE tags.
SANBI post-doc Soraya Bardien-Kruger is working with the University of Pennsylvania’s Chris Stoeckert and Mark McCarthy of the Imperial College School of Medicine to construct a controlled vocabulary of expression state terms that describe a gene. The vocabulary is broken into four categories: organ system, developmental stage, pathology, and tissue type.
While expression state information for these categories can be obtained from various data sources, Bardien-Kruger said that the lack of a standardized nomenclature leads to problems with accessing, collating, and organizing the information.
"These differences are magnified in the context of high-throughput systematic analyses and emphasize the need for consistent 'across-database' descriptions of the same terms and objects," said Bardien-Kruger.
In line with the controlled vocabulary, Hide is developing ExScript, an exon expression language. Hide said the language developed from "the need to have a sequence-level descriptor of expression as compared to database entries such as GenBank descriptions of the genome."
Hide envisions ExScript as a dynamic way to describe the set of isoforms from genes. The language will refer to the genomic sequence of the genes and the boundaries of the exons expressed as well as an underlying expression record. He plans to use the standardized terms for tissue and expression state descriptions within SANBI’s STACKdb.
Hide seeks community involvement in developing ExScript in an open source manner.
"Quite a number of groups have contacted us with regards to ExScript and we expect more interaction as the project grows," Hide said.
Bardien-Kruger said that the MGED group is open to the expression state vocabulary.
"There is currently a lack of tools that would enable human geneticists to annotate disease candidates in a systematic and comprehensive manner," Bardien-Kruger said.
Hide said that such a language would enable a better understanding of the relationships between expression products, "particularly in light of the diversity of transcribed products from the genome and the lower-than-expected number of human genes."