Researchers at the Scripps Research Institute recently demonstrated a method of finding links between genes, diseases, and SNPs by creating "meta-wikis" that combine information from multiple bio-wikis — community-generated resources containing scientific and medical data.
The project, a collaboration between Scripps and the Dresden University of Technology, demonstrates the usefulness of wikis in answering biomedical questions, according to Andrew Su, an associate professor in the department of molecular and experimental medicine at Scripps and one of the study's authors.
Co-author, Benjamin Good presented the project at the Bio-Ontologies special interest group meeting held prior to last month's Intelligent Systems for Molecular Biology conference.
The team created a mashup of two wiki-based resources, Gene Wiki and SNPedia, to identify connections between genes, SNPs, and diseases.
Gene Wiki, created by Su and his colleagues, aims to create seed articles for each human gene that will be annotated and expanded by community members.
Gene Wiki "embraces this idea that anybody in the community can contribute to the synthesis and the curation of the biomedical literature and help create these gene wiki articles," Su explained to BioInform. "The final vision is that we will have one of these review articles that’s continuously updated, that’s collaboratively written for every gene in the human genome. It’s a potentially powerful resource for biology."
SNPedia, a similar wiki-based site, is repository of articles on SNPs that includes short descriptions, links to scientific articles, and personal genomics web sites as well as microarray information.
The bio-wiki trend emerged several years ago as researchers turned to the wiki platform in an effort to compile and curate information that might otherwise be lost in the rapidly expanding sea of scientific literature (BI 1/2/2009).
A number of bio-wikis have since emerged that focus on different biological subdisciplines, such as Protopedia for protein structures, WikiPathways for biological pathways, and WikiGenes for gene-based information. While these efforts have become "important concept-centric knowledge resources," the investigators wrote in their abstract for the Bio-Ontologies presentation, they noted that "no single wiki contains all of the knowledge needed to answer most biological questions."
In response, Su and colleagues are looking to create meta-wikis that will provide researchers with information from disparate wikis in a single interface.
The key to making this technology possible, the authors wrote in their abstract, is a single application programming interface across multiple bio-wikis, "standardized systems for describing and recognizing biological concepts," and the addition of semantic extensions.
To demonstrate, the team created a mashup of SNPedia and Gene Wiki.
The first step is to install MediaWiki — which is widely used to create bio-wikis — along with the semantic extension in order to create a new wiki. The team then pulled articles from Gene Wiki and SNPedia and inserted them into the meta-wiki.
Next, using the National Center for Biomedical Ontology's annotator, the team identified disease ontology terms in Gene Wiki articles and then mapped medical conditions in SNPedia to those terms. Finally, they added semantic links between Gene Wiki genes and SNPs; genes and diseases; and SNPs and diseases.
Their wiki-mashup succeeded in unearthing more evidence of links between genes and diseases than either resource contains on its own, according to the investigators.
Specifically, it identified 4,426 gene-disease relationships. Of that number, 1,037 gene-SNP-disease connections were found only in SNPedia, while Gene Wiki accounted for 3,525 gene-disease associations.
Both resources contain large quantities of independent information, which the researchers attributed in part to differences in mining the source material used to populate them as well as differences in the nature of their core content. For example, a gene may be involved in a disease pathway but may not have any known associated SNPs.
However they concluded that the fact that there is so little overlap between the two resources "indicates the potential value of their combination."
Increasing Participation, Improving Content
Speaking at the SIG meeting, Su noted that bio-wikis can help researchers keep up to date on a rapidly growing body of biomedical literature.
"Some of the most successful ... bioinformatics analyses are based [on] formal gene annotations ... but the way we generate those gene annotations is essentially, small groups of curators who are trying to read the biomedical literature," he told BioInform. However, at the rate articles are published, "any sort of centralized group of researchers is just not going to be able to keep up with how fast the science is being done today."
On the other hand, keeping the content in these resources up to date requires a "critical mass" of readers and contributors, which isn't always easy to attain.
Nigam Shah, an assistant professor at Stanford University and one of the SIG's organizers, told BioInform that despite the rise in popularity of bio-wikis in recent years, it's still difficult to get community-created annotations primarily because there simply aren’t any incentives to do it. As a result, readers far outnumber contributors.
"The incentive in Wikipedia is altruism," Shah said, but "in scientific communities you need some way of assigning credit, some way of having your citation count ... if that can be worked out then the ratio of contributors to researchers would go up."
Su agreed that addressing incentives is an important issue, but noted that the lack of formal incentives hasn’t prevented the Gene Wiki from growing. There are currently more than 10,000 Gene Wiki articles and the site gets about 5 million page views and around 1,000 edits per month, Su said.
Still, "the more we build in formal incentives, I think [it will] improve the rate at which the Gene Wiki will grow and [will] improve the quality of the content," he said.
To that end, the group is setting up a partnership with the Elsevier journal Gene to copublish gene-centric review articles in the journal and Gene Wiki.
"The idea [is] that the peer-reviewed article would be the article of record," he explained. "That’s the article that people will cite … [and] it carries the same weight as a peer-reviewed publication" while the Gene Wiki version "becomes a living document ... that gets updated over time; it stays up the date with new findings."
As the body of knowledge in community wikis grows, Su's team is looking to extract structured gene and disease annotations from the free text in these resources that could be fed back into the source databases and used for statistical analyses.
During his keynote, Su discussed his group's efforts on this front with a project in which they used NCBO Annotator to look for disease, drug, and other biomedical concepts in Gene Wiki articles and then sent 6,319 candidate genes and 2,147 candidate disease ontology annotations to expert curators to review their quality.
It turned out that while the candidate disease ontology annotations had an overall specificity of between 90 percent and 93 percent, the gene ontology annotations had an overall specificity that fell between 46 percent and 64 percent.
"Overall it suggests that the community-generated content in the wiki combined with very basic text-mining tools can be used to propose structured gene annotations that are vey useful for analysis of genome-wide experiments," Su said.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.