Skip to main content
Premium Trial:

Request an Annual Quote

Xennex Eyes IP Law Firms as Promising Growth Market for GeneCards

Premium
During 2008, wiki-based collaborations gained a foothold in molecular biology with the launch of a number of wiki-based annotation projects such as WikiGenes, WikiProteins, Gene Wiki, and Wiki Pathways.
 
In the most recent example of this trend, the journal RNA Biology has decided to mandate Wikipedia entries from authors submitting papers to a new section on RNA families — a requirement that is, “as far as we are aware, a first for any scientific publication,” according to an editorial in the journal by Paul Gardner, the editor of the new RNA Families section, and Alex Bateman, the head of the Rfam database of RNA alignments and secondary structures.
 
The primary reason for requiring Wikipedia entries, Gardner and Bateman said, is because these pages are usually among the top-ranked hits in Google searches with molecular biology keywords. Since it is their goal to “ensure that the RNA-relevant information in Wikipedia is both reliable and current,” that time spent by experts will, they believe, help “improve the record.” In order to ensure this, they said, “the Wikipedia update will be reviewed alongside the submitted article.”
 
The creation of Wikipedia entries is also likely to benefit the Rfam database, they said, because the resource currently draws annotations from Wikipedia, so any Wikipedia articles written for the journal “can be used directly by the database as well as the community.”
 
But even as wiki-based annotation gains in popularity, some in the bioinformatics community are questioning the value of this approach because there are very few tools that enable downstream data-mining of Wikipedia pages.
 
For example, Masanori Arita, a computational biologist at the University of Tokyo, published a paper in December in Briefings in Bioinformatics that called wiki-based web sites “overrated” as the solution for large-scale management and for resolving data inconsistency in bioinformatics.
 
The challenge, he wrote, is the fact that wiki pages are “independent of each other,” so that changes made on any one page are not replicated on pages with related information.
 
In an e-mail interview with BioInform, Arita said that as a frequent user of Wikipedia he finds that “the idea of community annotation is great” and that applications such as Gene Wiki, WikiGenes, WikiProteins, and WikiPathways “may achieve high-level annotations in every single page.” The problem, he said, is for “concepts that span multiple pages.”
 
This challenge is due to the lack of page dependency inherent to Wikipedia and its underlying WikiMedia software. “To keep the consistency of information, when an original page is updated, all its proper copies in other pages must also be updated,” he said. Currently authors must duplicate information by cutting and pasting from one page to another, Arita said.
 
A Growing RNA Family
 
The challenge of manual curation is one of the reasons behind RNA Biology’s new guidelines. Gardner and Bateman note in their editorial that Rfam’s alignments and structures are derived from the literature, but “due to a lack of standards for publishing RNA alignments and structures, often the curators resort to manually typing in the sequence and structure from published figures.”
 
This approach, they said, “is not going to scale well in an era of comparative genomics, deep sequencing of RNAs, and RNA gene prediction tools,” so they envision the deposition of these alignments and annotations in the journal’s RNA Families track as a means of building a standardized archive of this information.
 
According to the journal’s guidelines, submissions to the new RNA Families section are to focus on either “substantial updates of existing RNA families” or descriptions of novel ones. Authors are required to submit material to the journal and to Wikipedia. Landes Bioscience, the publisher of the journal, did not respond to queries by BioInform about this section or its new policy before deadline.
 

“In a sense, a wiki can be a good tool for data collection, nothing more. … We need a tool for knowledge management.”

Gardner and Bateman said in their editorial that the journal’s new track is a forum for short publications that detail the structure, function, and sequence conservation for RNA families. “There will be two extra requirements for publication in this track,” the scientists wrote. One of the requirements is deposition of an alignment and secondary structure in Stockholm format. The other is the “generation or update of a corresponding entry in the online encyclopedia Wikipedia.”
 
According to the journal’s guidelines, the submission must include “at least one stub article” for Wikipedia centered around the RNA in question to be added either at the author’s user space on Wikipedia, which the publisher describes as the “preferred route,” or to the main Wikipedia space.
 
RNA Biology offers open access to its articles one year after publication, while authors who wish open access upon publication can pay a fee. For the RNA Families track, however, the articles will be published as open access texts both online and in print “at least in the first years of the track” while articles with color figures or more than four journal pages will incur a fee. The Wikipedia entry articles are to be peer-reviewed along with the manuscript, the guidelines state.
 
The first RNA Biology article to be published in this fashion is “A survey of nematode SmY RNAs” by Peter Stadler of the University of Leipzig and the Santa Fe Institute, Sean Eddy of the Howard Hughes Medical Institute’s Janelia Farm Research campus, and colleagues at the University of Vienna.
 
The article can be found here and the Wikipedia entry here.
 
Wickedly Wiki
 
Andrew Su, Senior Research Investigator in the Computational Biology Group of the Genomics Institute of the Novartis Research Foundation, who spearheaded the Gene Wiki project, told BioInform that RNA Biology’s venture is “a great experiment worth trying.”
 
It is a “nice, discrete well-wrapped pilot project” in which the subject matter of the RNA wiki matches the subject matter of the journal, he said.
 
Scientists might create Wikipedia entries “to varying degrees of enthusiasm” and it may be “tough to have very well-defined criteria [as to what comprises a qualifying submission] but it at least requires the authors to make an effort,” he said. Overall Su does not fear that the added requirement will discourage scientists from submitting papers to the journal. “I don’t think it will be that much of a big deal,” he said.
 
“I'm most excited about the prospect of ‘community intelligence,’” he said, which underlies projects such as Gene Wiki, which was developed to encourage scientists to contribute information about specific genes to Wikipedia [BioInform 07-11-08], and other wiki-based collaborations in molecular biology.
 
Once in Wikipedia, data are visible and accessible. “Wikipedia provides that framework for the community to continually edit and summarize and improve these articles, and that's still relatively unique in biology,” he said.
 
Overrated
 
But the University of Tokyo’s Arita said that simply making this biological information available online is not enough. “In a sense, a wiki can be a good tool for data collection, nothing more,” he told BioInform.
 
“The essence of scientific activity is to organize and extract knowledge out of collected data.” Accumulating data itself is “not science,” Arita said. “We need a tool for knowledge management.”
 
In his Briefings in Bioinformatics paper, Arita noted that wiki-based websites are a poor substitute for structured databases because they lack a mechanism to check data consistency. “As long as wiki is used as a weblog or encyclopedia, this independency is more than natural: authors take the responsibility for the contents, and they should not be changed automatically by other contents,” he wrote. “For a database system, on the other hand, the ideal design is the opposite: original, consistent contents are managed in the background, and its update affects all views that users create through queries.”
 
Arita wrote that the “inherent lack of measure for checking consistency may be fatal for forward-thinking biologists who use wiki for the community-driven data management; however, this drawback seems often unnoticed.”
 
As an example, he told BioInform, “Suppose a gene name is updated. I need to search all its occurrences in all pages and update them one by one,” but if there were a mechanism to propagate an update, it would be a lot easier.”
 
Arita said that he appreciates efforts to construct data tables in Wikipedia, such as an entry that lists cities by population but said “their maintenance would be extremely hard. Data in such pages are usually inconsistent with those in other pages. We need many such charts in science. How can we manage them on wikis?”
 
His suggestion is to create a hybrid structure, which is part wiki and part database. “In fact, major wikis are built on relational database systems. In this perspective, Wiki[0] is only a sandbox inside a database,” he said.
 
The idea would combine the strengths of both since “databases and wikis serve two different purposes. One is for structured, quantitative, or well-defined ideas, the other [for] unstructured, qualitative, or indeterminate ideas.” Arita said the he thinks that “half-structured, half-free formatted design is useful especially for biology research.”
 
One way to achieve this hybrid design, Arita suggested , is to take the path being paved by Semantic MediaWiki, a semantic extension of the MediaWiki platform that organizes content, tags it, and allows users to browse and share.
 
Another approach that Arita said is a more “straightforward” translation from the relational model to wiki pages involves users embedding scripts into web pages. This method would achieve “more powerful search function, efficient page design, and most of all, we can propagate updates.” He used this approach in a website run by his group and researchers at two other Japanese universities for research results relating to metabolism.
 
“My wish is to organize a project team to design a next-generation Wiki[0] or cyberinfrastructure that can manage data integrity while maintaining good parts of the current wikis,” he said.
 
Arita told BioInform that he has begun discussions with researchers involved in the iPlant Collaborative and “implemented a small prototype of my idea.”
 
GNF’s Su said that Arita “raises some interesting points” in that the arguments he lays out are in line with those for the Semantic MediaWiki.
 
“Semantic MediaWiki would fantastic in terms of its applicability to genetics and biology,” Su said.
 
Semantic MediaWiki, however, “has yet to find its really great application,” Su said. Although it is finding users, it is “nothing on the scale of Wikipedia,” which evolved from an idea about an application in which many users can edit a text with the software built “to satisfy that need,” Su said.
 
Su acknowledged that after information has been entered into Wikipedia, pulling data out in a structured way and mining it is a significant hurdle. “Now all the data miners are saying Wikipedia is great, but it doesn’t allow me to do downstream data mining.”
 
Researchers are cognizant of some of these shortfalls, Su said, but when scientists choose to distance their project from Wikipedia, they risk losing visibility. “A one-off wiki solution can easily languish without a user base, which is one reason why we are going with Wikipedia,” he said, referring to Gene Wiki.
 
“The semantic part really bothers people who are trying to get data out of Wikipedia or out of the Gene Wiki to do downstream data mining,” he said. “But everything to this point has been about encouraging people to get data into the Gene Wiki and into Wikipedia, and that is where you don’t really care about Semantic MediaWiki.”
 
In Gene Wiki, for example, a change on the page dedicated to the gene utrophin will not propagate to pages about other genes that are associated with the cytoskeleton, Su said. Even a hyperlink lacks context about the relationships between genes, so that searching for all genes connected to the cytoskelton “is very difficult right now.”
 
“You don’t know, for example, does utrophin promote cytoskeletal development or does it promote cytoskeletal destruction, or is it involved in disease processes related to the cytoskeleton, or is it just a link to another concept?” Su said. With Semantic MediaWiki, users or Wiki programmers such as his colleagues in the Gene Wiki project would be able to launch those types of queries, he added.
 
Semantic MediaWiki tools include the SPARQL query language and protocol; RDF, the Resource Description Framework, to describe data; and the Web Ontology Language OWL that lend the RDF terms meaning.
 
Ideally, Su said, Wikipedia could adopt Semantic MediaWiki technology, “but that is a longer row to hoe because of all sorts of technical and bureaucratic hurdles.”
 
Up until now, wiki-based collaborative projects have focused on lowering the barriers and structural hurdles for participants to entice them to contribute data. “The Gene Wiki and Wikipedia are very focused on making it as easy as possible for people to contribute data, meaning they require little or no structure,” Su said. The more structure necessary for the data, the lower the likelihood to find users willing to adhere to that structure and contribute, he added.
 
Even the layout templates of his Gene Wiki pages, he acknowledged, offer a bit of structure for “what is fundamentally an unstructured platform.”
 
“It’s sort of fake structure, it’s a structure in terms of the layout but it’s not structuring the data and so we’re not allowing people to do downstream data-mining,” he said.

Xennex, a company formed to commercialize the Weizmann Institute's GeneCards database, has begun targeting intellectual property professionals in a move to broaden its customer base beyond biotech and pharma firms.

Last week, the company announced a multi-site, unlimited user agreement with the European Patent Office that provides some validation of this new strategy. Through the agreement, the database, which includes information on the structure and function of more than 21,000 human genes, will be made available to several hundred EPO patent examiners in Munich and Berlin, Germany; The Hague, The Netherlands; Vienna, Austria; and in EPO member states' national offices.

David Warshawsky, CEO of Xennex, said that there were already a number of intellectual property law firms using GeneCards when the company began commercializing it in mid-2003. GeneCards has been under development since 1996, and has been freely available to academic and commercial users for most of that time. DoubleTwist marketed a commercial version of the database for less than a year before the company folded in 2002, so Xennex stepped in following a "transition period" in which GeneCards was once again freely available for academic and commercial users through the Weizmann Institute and mirror sites.

Warshawsky, who estimated that the total number of GeneCards users is in the thousands, said that converting commercial users to paid licenses has been "very successful" due to the popularity of the resource. Commercial users are no longer permitted to use the database at all without a license, so it's not difficult to convince most of them to retain access to it.

The commercial version offers a level of security that the free version does not. Subscribers can access the resource via Xennex's secure server, or may opt to have the database installed behind a company firewall — an option that is not available in the academic version, Warshawsky said.

Xennex's customers include "many" biotech and pharmaceutical companies, as well as academic groups. Warshawsky noted, however, that "most of our clients wish to remain anonymous."

In a recent effort to expand its presence in the marketplace, the company signed an agreement with Lion Bioscience to integrate GeneCards with the new version of the SRS platform [BioInform 2-14-05] — a deal that Warshawsky said should appeal to a broader base of commercial users who already have access to SRS.

Xennex also decided to extend its reach in the intellectual property community by modifying its pricing model. Unlike its subscription model for life science researchers, which is a "flexible" flat fee based on the number of users and the type of installation, IP law firms can opt for a "pay-per-use" model, Warshawsky said. The structure appeals to IP law firms, he said, because they don't have to pay an up-front fee for the resource, and can charge their clients directly every time they use it.

GeneCards serves as an "add-on" to other common IP resources, Warshawsky said, such as the Derwent and USPTO patent databases, PubMed, Westlaw, and LexisNexis. The current version of GeneCards contains more than 42,000 entries on human genes and gene products extracted from more than 50 publicly available resources. Users can access the database to get a quick summary of current knowledge on any given gene, which Xennex claims is of great benefit for IP professionals conducing searches for prior art.

This application is what appealed to the European Patent Office, according to Gerard Giroud, principal director of documentation and tools for the EPO, who noted in a statement that the agency's biotech examiners had "requested" access to the resource.

The EPO has proven a willing customer for bioinformatics products that can support its patent-searching activities. Last June, it licensed QBioCom's MPSRCH sequence search software for analyzing biological sequences that are included within patent applications. [BioInform 06-21-04]. This followed an announcement four months earlier that it had licensed Gene IT's GenomeQuest search tool for similar applications. [BioInform 02-09-04].

Warshawsky said that the IP sector is a "great niche" for bioinformatics products like GeneCards, but noted that the company's entry in this market is a means of expanding — not replacing — its "bread and butter" customer base in biotech and pharma.

According to a statement on Xennex's website, the EPO's use of GeneCards "not only validates the value GeneCards brings to IP professionals," but should also attract firms working in biotech IP because "patent applicants would likely want to make sure to utilize as good, or better, resources as the agencies that determine whether their applications are patentable."

Warshawsky said that Xennex, which employs less than a dozen people, works very closely with the Weizmann Institute team to generate the commercial version of the database. Currently, Xennex is in the process of commercializing another set of Weizmann Institute databases that are integrated with GeneCards: GeneNote (human gene expression data); GeneAnnot (annotation of Affymetrix probe sets); GeneLoc (an integrated map of the human genome); and GeneTide (an automated system for annotating human mRNAs and ESTs and predicting genes).

Warshawsky said that the company plans to begin releasing these new products "one by one" some time this summer.

— BT

Filed under