Two new community-curated genomics projects, one involving WormBase and the other a new project called WikiGenes, join a cluster of Web 2.0-oriented projects that have been unveiled in recent months such as WikiProteins, Gene Wiki, and Wiki Pathways.
WikiGenes, envisioned as a general repository of community-contributed biological information, was recently described in the September issue of Nature Genetics and is available here.
Developed by Massachusetts Institute of Technology’s Robert Hoffmann, who is a Branco Weiss Fellow, WikiGenes has a strong emphasis on authorship and attribution that is intended to set it apart from other wiki-based bioinformatics projects. The system deploys an “authorship-tracking” methodology that links every contribution “unambiguously” to its contributor, which is a key requirement for scientific information, Hoffmann wrote in the Nature Genetics paper.
The developers of WormBase, meanwhile, are looking to add Web 2.0 functionality to a resource that was developed before there was even a Web 1.0.
Todd Harris, senior bioinformatics developer at Cold Spring Harbor Laboratory and WormBase project manager, told BioInform that the official model organism database for C. elegans and related nematodes, created in the early 1990s, just had its funding renewed for another five years and that a key aspect of this phase of the project will be adding Web 2.0 functionalities to the longstanding resource.
These features will be rolled out one at a time as part of the existing database, rather than as a complete remodel of the resource, Harris said.
Investigators on the WormBase project include Paul Sternberg at the California Institute of Technology, Lincoln Stein of CSHL and the Ontario Institute for Cancer Research, Washington University’s John Spieth, and the Wellcome Trust Sanger Institutes’s Richard Durbin.
Harris said that the project’s developers believe that as participatory media grow in popularity, genomics resources stand to gain from capturing that functionality.
WormBase has evolved in step with the evolution of the Internet. It began as a desktop application, but “when the Web came around in the mid-90s, it became a web-based resource,” with a team of curators extracting facts from the literature, ensuring data integrity for the scientific community, and entering data into the database, Harris said.
One goal now is to make the annotation process easier and to develop tools that WormBase and other model organism resources can employ. There will be a programming interface to allow other sites to use WormBase data, and Web 2.0 functionality such as tagging and expanded wikis.
Harris said that even before the term “Web 2.0” was coined, he and his WormBase colleagues explored user-based functionality, such as wikis and soliciting community submissions of data. “People really submit and share data, but it is nowhere near as much as you would hope.”
Harris said that he is keeping an eye on social networking technologies and exploring how WormBase can leverage third-party tools to simplify community annotation. One challenge, he said, is that the solution must work for experts who are performing genome annotation as well as for casual users with data to share. Right now Harris said he is thinking most about helping casual users participate in the annotation process.
“If you have the option between an anonymous wiki and a wiki where you get due credit for your contributions, this choice is very obvious — there will be no way back to anonymous wikis.”
The engine for WormBase, the object-based AceDB database system, will remain the same. Although Harris said that AceDB is a little “arcane,” it turns out that it is very well suited for Web 2.0 functionality because “instead of having complex report pages, you expose all of the underlying facets of the database to the user and you let them decide what a gene page [should] look like to them.”
While there has been a trend toward standardization in bioinformatics, scientists prefer “carving out their own niche,” he said. “The beauty of Web 2.0 is that we can still have that diversity in the low-level architecture but we can still have interoperability of data and databases,” he said.
In addition, he said, the WormBase developers plan to build “an extensive web service interface to our database.”
Web services let users employ simple methods to get data out of a database. “If every database had a simple web service interface, that would facilitate the idea of a mashup, where researchers can pull data from one database and another and mix it up in interesting ways,” he said.
Users will be able to use these services to fetch any bit of information from WormBase. “The page now is very monolithic but it is broken into sections. [In the future] each of the sections will be customizable,” Harris said. Users will be able to pick the sections they want to see on a page, the order in which they appear, or a Blast threshold of their choosing.
Displaying complex datasets such as those from next-generation sequencing datasets in traditional browsers is a challenge, but, he said, most users want “distilled” data. “They want to see a track on a genome browser, a histogram or a density plot where the data has already been processed into a format that you can make a sense of.”
One feature Harris is designing for WormBase is borrowed from Web 2.0 sites like Flickr and Del.icio.us that let users add descriptive terms to any object.
“I built a little expression pattern database using Flickr,” he said. “There will be public and private tags so people can add their own annotations for themselves.”
Building on the idea of collective intelligence, scientists can add annotation tags such as “this is a kinase,” or “this is a really big gene, I spent five years trying to clone this,” he said. What Flickr does right, he said, is offer simplicity with a clean interface and intuitive navigation. “There is an emphasis on content in biological databases but sometimes that comes at the expense of design,” he said, adding that he also does not want to the design to obscure the content.
One aspect Harris added to the Flickr concept is authorship. In WormBase users must be logged in to add tags, which can be followed so users can discern who placed them. “I think you really do need some way to recognize people,” he said, though he noted that this model is not likely to replace traditional peer review any time soon.
“If it were something with which people could walk into a tenure review and say, ‘I made 1,000 edits to this community site,’ it would be very interesting, but I think [that] is a long ways away.”
WormBase is also planning more community forums and an embedded wiki. “We are still kicking around the idea of having people click on an individual piece of data and edit it or correct it. There are some real technical [challenges] to that as well as editorial challenges,” he said.
If implemented, this new wiki would likely differ from the current unstructured wiki model. “The problem I think with wikis is that you need to have a critical mass of people looking at it,” Harris said. “I don’t know what that number is, but if you have too few people and somebody writes some bone-headed statement somebody might not see it for two years.” That could even cast doubt on the entire resource, he said.
Harris said that a demonstration of WormBase’s new features will become available shortly after he and his colleagues migrate the database from CSHL to the Ontario Institute for Cancer Research, which is occurring over the next few weeks.
WikiGenes, meanwhile, is a new Web 2.0-oriented genomics resource that has authorship as a cornerstone concept.
Despite its similar name, WikiGenes is not connected to Gene Wiki, which was developed to encourage scientists to contribute information about specific genes to Wikipedia [BioInform 07-11-08]. Rather, WikiGenes is a standalone resource that aims to serve as “a rigorous scientific tool” for collecting and presenting user-contributed biological information.
WikiGenes developer Hoffmann told BioInform thatin the few weeks since the Nature Genetics paper was published, the site has clocked 50,000 visits, several scientists have registered and begun editing the resource. He said the number of registered scientists was not available.
The Swiss entrepreneur and philanthropist Branco Weiss funds the project.
Hoffmann previously developed iHOP, or information hyperlinked over proteins, a text-mining tool that extracts information from PubMed abstracts and combines it into overview articles.
WikiGenes was launched with around 100,000 of these iHOP articles, which act as “a matrix or substrate for this early collaborative phase,” Hoffmann said. “People shouldn’t be shy about editing it, we need these pioneers.”
When scientists publish in established peer-reviewed journals, he hopes they will find it useful to also publish in this more visible and possibly “more future-oriented” resource. “The whole thing is an experiment,” he said. “It is really up to the scientists, to my colleagues, to make something out of it.”
“Most importantly by adding your own research results, whatever you know about the genes, the chemicals or the diseases, by adding it into WikiGene you contribute to a comprehensive summary about the gene,” Hoffmann said.
Information about genes is currently dispersed over, in some cases, thousands of articles in different journals and different types of media. “If you want to reverse engineer this, or simply to read all this, it is a pain to find out about one single gene,” he said.
Scientists are performing ever larger experiments testing for thousands of genes at the same time. “If there are genes showing up you have never heard of, then it is very important to have a resource where you can see what these genes do. That is the secondary benefit of WikiGenes. You benefit with your bits of information, but if lots of people do that then we have a unique resource which will be beneficial to everyone.”
While the National Center for Biotechnology Information and the European Bioinformatics Institute offer many high-quality, curated resources, these resources are not comprehensively integrated or unified, he said. On WikiGenes, he explained, scientists can collaborate to create an overview of any particular gene, obviating the need to go to many different articles.
Say It, Edit It
Whether scientists will participate in a wiki-based annotation project is a question of the perceived benefit of contributing, Hoffmann said. “Will they be seen by others, is this citable, will this enhance my interactions with other scientists?” are among the questions that researchers will ask as they consider whether to contribute, he said, adding that WikiGenes’ focus on attribution should attract the interest of scientists.
“If you have the option between an anonymous wiki and a wiki where you get due credit for your contributions, this choice is very obvious — there will be no way back to anonymous wikis,” Hoffmann said.
For everyone who has original contributions, authorship matters, he said, which is unlike classic Wikipedia with its encyclopedic — and anonymous — approach. Users must register to use WikiGenes and the resource displays all user contributions.
In the case of a dispute in Wikipedia, editors make final decisions. “Those are top-down decisions,” he said. But with thousands of articles, that is not practicable. For that reason, WikiGenes relies on a “reputation system” that lets users rate each contribution.
The reputation system “enables readers to report author-specific vandalism and therefore also functions as an immune system, preserving the interests of the community,” Hoffmann wrote in the Nature Genetics paper. “On the other hand, especially committed authors could enhance their reputation and assume more responsibility, for instance, in the arbitration of editing conflicts.”
The key to this approach, Hoffmann said, is authorship tracking. “Without authorship you can’t develop a reputation system,” he said.
WikiGenes also departs from one of the Wikipedia “pillars,” or values, which states that articles cannot involve original research. “We want original research,” Hoffmann said, explaining that there is a section on WikiGenes for creating articles that are only editable by authors who have been invited to edit.
Hoffmann acknowledged that the recent flurry of wiki-based annotation projects, which include Gene Wiki, WikiProteins, and WikiPathways, presents some competition. “You can’t hide from that,” he said.
Andrew Su, senior research investigator in the computational biology group at the Genomics Institute of the Novartis Research Foundation who heads the Gene Wiki effort, said that the sudden burst of wiki projects is likely due to “convergent evolution where a lot of people thought of similar ideas.”
Ideally, Su said, scientists will go where their colleagues are going. “Ultimately the ones that really take [off] will be decided by the community. It’s people voting with their participation.”
One “disadvantage” for WikiGenes “is that it is very text-based,” Su said, while Gene Wiki offers the possibility of graphic visualization and images. In addition, without a concern for attribution, scientists can “write something that sounds like a readable document with a reasonable amount of prose as opposed to a series of disjointed statements,” said Su. Gene Wiki uses full sentences such as those found on other Wikipedia pages whereas WikiGenes presents more sentence fragments to users.
Explaining the WikiGenes approach to the language of scientific entries, Hoffmann said, “When humans want to express new thoughts, they don’t want to be limited by database schemas to express themselves,” he said. “That is why entering information for a human is not much fun. You need to define the relationship from an ontology.”
Referring to another 2.0-based effort called WikiProteins, Hoffmann said, it is “well-founded” and defines the relationships between genes, proteins, and chemicals in a machine-readable way, which makes the information itself machine searchable. WikiGenes is more readily machine readable than Gene Wiki, he said.
“We have all taken slightly different models toward how to harness the idea of community intelligence,” Su said, noting that no one project claims to have found the only answer.