NEW YORK (GenomeWeb) – This week, the Pistoia Alliance, a non-profit alliance of life science organizations, announced that it is launching the Ontologies Mapping project, which aims to develop better tools and establish best practices for ontologies used in life sciences research and development.
The standardized terms that make up ontologies or controlled vocabularies are important in research and development data management because they improve researchers' ability to explore and analyze large amounts of complex data. The problem, however, is that researchers use various ontologies to annotate their datasets, some of which are duplicates of each other and some of which use different ontological terms for the same data point, Richard Holland, Pistoia's executive director of operations, explained to GenomeWeb this week. Varying vocabularies makes combing and analyzing these datasets in tandem more challenging because it can be quite difficult to determine which terms across ontologies are equivalent.
To help ease data integration pains and enable researchers to conduct searches across datasets, members of the ontology mapping group will work on creating standardized guidelines, tools, and services that will make it possible to map terms across ontologies. It's akin to providing a thesaurus that researchers can use to look up terms to see if they mean the same things across datasets, Holland said.
Full details of the project's business plan are available on Pistoia's Interactive Project Portfolio platform (IP3) website. The alliance has tapped Ian Harrow, an independent bioinformatics and text mining consultant, to manage the project and assemble the team that will work on the standards. The actual team is still being assembled, but according to the IP3 site an interest group has formed around the project that includes researchers from pharmaceutical firms such as Merck, Boehringer Ingelheim, and Roche, as well as from the US National Institutes of Health and the Jackson Laboratory. Also, members of the alliance have pledged $125,000 in funding for the first phase of the project, according to the IP3 site.
The first phase, which will run until the end of 2015, will focus on defining specific requirements for a standardized tool that will allow users to integrate, understand, and analyze their data more effectively. The group will also pick one or two research domains — and by extension a few ontologies — that they want to use as a testbed for the standards they develop, Holland said. For Phase II of the project, which will launch next year depending on funding availability, the group will focus on actually creating some of the frameworks and tools described in Phase I, he said. An expert community, made up of both alliance members and possibly non-alliance members, will supply ongoing guidance throughout the project, and all tools developed under the auspices of the project will be released free of charge under an option source.
Other Pistoia projects include the Hierarchical Editing Language for Macromolecules (HELM), which aims to provide an open source software toolkit and editor for representing complex biomolecules such as proteins, nucleotides, and antibody drug conjugates. The initial HELM project wrapped last year but the alliance is continuing to build on the newly minted standard through a new project dubbed OpenHELM. A second project, Controlled Substances Compliance Services (CSCS), which is currently ongoing, focused on developing IT solutions that would enable researchers to keep track of legislation covering controlled substances.
Meanwhile, the alliance also has a project whose goal is to seek out collaboration and partnership opportunities with like-minded organizations, Holland said. It is working on establishing an expert community that will be responsible for mapping the landscape of industry associations and alliances and their respective domains and scopes and making this available to members who can then follow up as it suits them best.
The alliance also plans to launch a startup challenge this year that will offer a prize for the best business plan submitted by a biotech startup as determined by a panel of judges — more details on that will be available in the next few weeks The alliance is also mulling a second contest focused on interactive genome analysis that will compare various methods of analyzing genomic data faster, but that's a much more distant plan, he said.
Also, the alliance is currently fundraising for two new projects, one of which will focus on standardizing disparate data warehouses to make them more consistent and compatible; and a second that seeks to make incident reports from chemistry laboratories readily available to help avoid repeat accidents, Holland said.
Other planned projects include one focused on contributing antibody structures to the Protein Data Bank, Holland said. The alliance is also exploring a possible collaboration with the developers of cBioPortal, a web-based a web-based software system for cancer genomics developed at the Memorial Sloan Kettering Cancer Center.