Researchers at the Novartis Institute for Functional Genomics have launched Biology Gene Portal Services, or BioGPS, a gene portal designed to display gene-centric annotation that can be customized for individual users.
BioGPS is slated to replace SymAtlas, a web application developed by researchers at GNF and the Scripps Research Institute in 2003 to explore tissue-specific gene expression datasets.
Andrew Su, senior research investigator in the computational biology group at GNF and lead developer of BioGPS, told BioInform that the new portal actually went live last summer, but its designers didn't consider it formally launched until they improved user customization and interactivity features and added a plug-in registration system.
The BioGPS project is funded with a five-year grant from the National Institute of General Medical Sciences. According to an NIH database, the project received $318,000 under the award in 2008.
BioGPS has slightly more than 200 users from the US, Canada, Europe, and Japan with a 50:50 split between academic and commercial users. This user base is expected to grow dramatically as GNF phases out SymAtlas, which is "getting a bit long in the tooth," Su said on the BioGPS blog.
SymAtlas has between 1.7 million and 1.8 million hits a year with about 40,000 unique visits annually, he said, noting that GNF plans to "translate our SymAtlas user base into a BioGPS user base" over the course of 2009.
BioGPS has incorporated Web 2.0 functionality in several ways. Scientists can customize the portal by creating a layout with resources of their choice, load their own data, share plug-in tools, and harness "community knowledge" for example, by watching the popularity of plug-ins across the resource, he said.
Initially, the GNF scientists were planning an updated version of SymAtlas, but Su and his team began expanding functionalities as they brainstormed, he said. Now BioGPS is slated as the SymAtlas "successor," he said and, as with any Web 2.0 application, is looking to gain a "critical mass of users."
"Our goal is to be a content aggregator and give the community easier access to all the resources that are out there," said Su, adding that the portal is a companion rather than a competitor of resources such as Entrez Gene.
"We're targeting the people out there who are using genome-wide technologies, microarrays, deep sequencing, proteomics," he said. Scientists performing a genome- wide profile of cancer-versus-normal may end up with 100 features that differ between their two data sets. "This is a hypothesis generation stage, so now the question is what are these 100 things? What the heck do they do?"
The BioGPS site is based on the "concept of layout," he said. Once a scientist enters a gene of interest, windows open up on that gene in several web-based resources, for example, model organism databases.
Presently, BioGPS is devoted to information from resources on human, mouse, and rat genes, but Su explained there are plans to expand, for example, to microbial genomes and other organisms.
[ pagebreak ]
As scientists move forward in analyzing experimental results, they generally consult up to a dozen "standard web sites" Su said, such as Entrez Gene, Ensembl, UniProt, or the Mouse Genome Informatics site. Each site delivers "partially overlapping gene annotation," so users must visit each, enter their search, learn the interface, and learn how to find each of the genes of interest on that site, he said. "Often that is a quite daunting process."
The idea behind BioGPS, Su said, is to avoid that process as well as reveal to researchers smaller and less-known gene portals that scientists might have missed.
An Easy Handshake
A BioGPS search delivers information harvested from public databases on gene identifiers, aliases, location information in the genome, and Gene Ontology function.
Scientists exploring genes of interest might, for example, open up the Mouse Genome Informatics window next to the Rat Genome Database window. Going straight to the gene of interest in each window means "we don't need to figure out the interface for the Mouse Genome Informatics site, how to search, what identifiers they have," Su said. "BioGPS has done that mapping on the back end."
To make the application programming interface handshake happen, the team is using "the simplest API that you could imagine," he said. BioGPS uses IFrame, or Inline Frame, to let users drag around windows within a window in the system of "a browser within a browser," Su said.
Database resources need not adapt to IFrame, since most sites are amenable to this kind of deep-linking, he said. "They have URLs that are formed based off having an identifier in the URL." BioGPS can translate many different identifiers, he said. "We just have to know what the format of the URL that MGI uses and substitute the right identifier."
In a navigation toolbar, scientists can also create a personalized list of genes of interest, called "My Genes."
This is a function that Harvard Medical School researcher Neil Kubica is planning to use. Kubica, a postdoctoral fellow in the Department of Cell Biology, has previously used SymAtlas and said he is "brand new to BioGPS."
In an e-mail to BioInform, he said, "I am still not 100 percent sure if it will help us in our research."
Kubica is studying a microRNA that is increased in the PC3 cell line following treatment with a small molecule, and said that he is particularly interested in identifying particular mRNA targets for the miRNA.
Kubica said he is currently using several publicly available miRNA target prediction algorithms, including TargetScan 4.2 developed in the lab of David Bartel at MIT's Whitehead Institute, which "predicts [around] 700 mRNA targets for our miRNA of interest."
Now he is interested in using BioGPS to extract expression data for these genes from PC3 cells, other prostate cancer cell lines, and normal prostate tissue vs. prostate tumors.
Among his questions of interest: Are the mRNA targets for our miRNA of interest present or absent? What is the relative expression level of these mRNA targets in these cell lines and tissues?
In response to Kubica's query, Su outlined how BioGPS might help with a quest for the gene annotation of microRNA targets. TargetScan and other resources can be grabbed from the BioGPS plug-in library to create a miRNA layout in BioGPS, and users can then add gene expression pattern data to the miRNA layout, which can be saved for future use.
Larry Moran, a biochemist at the University of Toronto, told BioInformby e-mail that he had looked at a few of his "favorite genes" in the portal. "I don't think it's a very useful database," he said, since it is a summary of information gleaned from other databases with "no attempt at annotation."
In addition, he said, "much of the information is wrong or misleading," such as some of the expression profiles, which "seem to be incorrect; probably because the data is for another gene and not the one in the database record."
[ pagebreak ]
Users "who would rely on that sort of expression data would be making a very serious mistake," he said."
Reacting to these comments, Su said, "I think it is a good thing, in terms of making those errors more widely seen. The more eyes that see it, the more likely that that error will be fixed."
Being able to detect errors, however, has to be connected to the ability to fix it, he said. "This is the wiki principle, everybody can edit it, everybody can fix it, everybody has the responsibility and the power to make sure it's correct."
Other BioGPS users have documented their first experiences with BioGPS on various blogs. One frequent question posed whether users can save their "My Gene" list until the next time they use the portal. That is "definitely an area of expansion," Su said.
Although users are not obliged to register to use the public resource, a free BioGPS account gives access to a library of 96 plug-ins. Slightly over a dozen plug-ins are private, he said.
Strictly speaking, these are more "conceptual" plug-ins, Su said, because code is not kept on a user's computer. "We want this plug-in library to grow," he said, adding that he and his staff seeded the original crop of BioGPS plug-ins.
The plug-in library includes such resources as Online Mendelian Inheritance in Man and the KEGG database for human pathways, which users can employ to tailor a layout to suit their needs and save that layout for future usage.
BioGPS is similar in some ways to the Distributed Annotation Service, or DAS, which was developed as a client-server system to integrates information from multiple genomic annotation servers.
The exchange of information via DAS is based on "very structured data" and its specifications are "difficult," Su said, making it a "great" tool for large genome centers with dedicated bioinformatics staff, but not as suitable for end-user biologists.
Data interchange in BioGPS, on the other hand, is through a web browser, so whatever a plug-in provider can show in HTML "can be shown in BioGPS," he said.
Growth of BioGPS could be fueled by the "community extensibility" aspect of the platform, he said, since users can create new plug-ins to share.
To do so, they must create a URL template with standard identifiers for that particular resource, such as an Entrez Gene identifier or UniGene identifier, that "most gene-centric portals use to query their databases," Su said.
BioGPS handles the actual identifier translation. "Users searching in their gene list of 50 [genes] do not need to know how each one of those plug-ins handles its database," said Su, because BioGPS does "gene identifier translation."
Registering a plug-in with BioGPS gets someone "visibility with the BioGPS user-base," he said, and then users can decide if it is useful to continue maintaining it given the usage it receives.
The plug-in library uses a tag cloud, the common Web 2.0 popularity metric, to give users a visual representation of the popularity of plug-ins on the layouts that BioGPS users have saved.
"The more we can take advantage of this idea of community intelligence, the idea that the more people that use the application, the more powerful the usage patterns are for everyone," Su said.
While a help desk is not part of the grant, users with questions can post them on the BioGPS blog or query Su and his team. In the future, Su said he wants to let users not only share plug-ins but also share layouts that could also be tagged according to popularity.
In a next step, users will also be able to share gene lists in BioGPS. For example, if a research team has its own list of genes that are differentially expressed in breast cancer, "it would be very interesting to compare that list to all the other saved and shared gene lists the rest of the biology community has shared," he said.
Currently, users can register a plug-in that either they alone or the community at large can see. "Eventually, probably in the next year, we will have very fine-grained permissions control," he said. "We want to allow people to be as secretive or as open as they want to be."