Building destinations on the Internet where researchers can share data, experimental workflows, newly developed applications, and results may sound like an established concept that became standard practice in the late '90s. However, it has only been in the last few years that these so-called Web 2.0 technologies — based on service-oriented architecture that allow for social interaction and sophisticated Web applications — have matured enough to permit the design of useful and collaborative online environments.
"The Web has certainly existed for a long time, but only recently has there been a really big change in the types of technologies that Web browsers support, such as much more support for richer experience and applications, where you can go and build interactively in a drag-and-drop framework. And that all relies on new Web technology," says James Taylor, an assistant professor at Emory University. "It is an obvious thing to do, but part of it is just that it takes a lot of work to get right. One of our successes is really about the usability on the Web and that has taken a long time and a lot of observation and interaction with users to really make something that provides an effective and efficient experience throughout the Web."
Taylor is part of the original team that developed the Galaxy project, a Web-based open platform for performing reproducible genomic analysis. Galaxy also contains the Galaxy Pages — interactive, Web-based documents that provide users a way to communicate an entire computational analysis. But why even bother learning how to use such a resource when e-mail might suffice? Taylor says that there are a lot of benefits, and that the genomics community is steadily coming to that realization — the Galaxy servers process roughly 5,000 jobs per day. "There are places to get bioinformatics tools, and there have been for a long time, but the idea of actually integrated analysis environments that are Web-based [is] relatively new to the community, there aren't many others that are entirely focused on being integrated and entirely Web-oriented — it's a powerful idea," he says. "Our big push now is really about the Galaxy community. We've reached the point where we now have a lot of users on our main site, but also people who are running their own Galaxies, so we need to provide the infrastructure to make it easy for people who are building new tools and new workflows, so that they are better able to share those with other people."
Engineering an online social space where researchers and their colleagues can share and execute workflows was the basis for the myExperiment Virtual Research Environment. The myExperiment platform provides scientists with a collaborative environment where they can safely publish their workflows and experimental plans, share them with groups, and find those of others in digital bundles, called packs, that can be searched for and shared with other users. The University of Southampton's David De Roure, professor and myExperiment co-creator, says the initial inspiration to design such an environment came from the desire to take a Web 2.0 approach like that of YouTube or Flickr, but for science. "We set out to be the site for scientific workflows and to be usable in a really familiar way, plus we focused on the real needs of scientists so we set out to support privacy, credit, attribution, and licensing," De Roure says. "Mainly, we wanted to help people do research and share their know-how, but I must admit that partly we wanted to make a point to the academic software world that actually the Web is a really good way of doing things effectively for scientists. In other words, you don't have to build big complicated solutions, and what really matters is the social infrastructure."
The growth and usage of myExperiment is a good indicator of its usefulness, with some 1,600 workflows for multiple workflow systems already hosted as well as an increase in the number of literature citations. "We have lots of regular users plus all those people just coming in to download content, and we're spotting citations of myExperiment workflows in research papers. We're pleased that we've successfully become part of the scholarly knowledge cycle," De Roure says. "At the same time, the site has proven to be provocative in the fields of open repositories, linked data, and reproducible research, so we attract research attention from those communities, too. In a world that is rightly focusing increasingly on data, we remind people that methods matter just as much; it's not just having the data, it's what you do with it that counts. So myExperiment is kind of a manifesto for the primacy of method, and for reproducibility, and it challenges traditional notions of scholarly publishing."
Scratchpads is another freely available, collaborative Web-based framework intended to be a sort of social network for academic researchers who wish to store and share taxonomic data sets. Scratchpad sites are hosted by the Natural History Museum in London and typically contain original user content as well as materials imported from the Encyclopedia of Life project, an online resource that currently hosts 500,000 Web pages of detailed species descriptions. Encyclopedia of Life developers have written application programming interfaces — sets of specifications that allow software programs to seamlessly access other resources, that enable Scratchpads to download data from them in order to share with other Scratchpad communities and also to publish additional species information at the Encyclopedia of Life.
The Scratchpad framework is designed for the construction of content--rich Web pages about any species using several resources including Genbank, Morphbank, the Global Biodiversity Information Facility, and Google Scholar. Scratchpad enables researchers to organize data around user-defined or imported ontologies and utilize automated, semantic annotation and indexing for easy curation and navigation of diverse biological data sets. The project is maintained by the Virtual Biodiversity Research and Access Network for Taxonomy, a European Union-funded project that supports the development of virtual research communities involved in biodiversity science. To date, the ViBRANT community has amassed 2,340 users from more than 40 countries.
Other efforts are in the works to develop a collaborative data resource for well-annotated tissue samples using expression profiling, genome-wide methylation using CHARM, and next-generation sequencing data. Launched in 2009, the Lung Genomics Research Consortium is a Web portal, hosted by Dana-Farber Cancer Institute, that aims to provide users with a fully integrated data repository and analytical tools resource within a collaborative framework. "By creating a rich data resource with integrated analysis tools, we are trying to support collaborative research on a much larger, even global, scale," says Mick Correll at Dana-Farber. "For most of the tools we are developing, the collaborative components are very much targeting the non-informaticians, so while not overly complex or comprehensive, they are intended to be simple, functional tools that people actually use."
Practically speaking, researchers can annotate and publish result sets that can be discovered and used by others. These results can be in the form of a defined cohort of patients, a gene signature found to discriminate between disease states, or set of genomic regions found to be hyper-methylated in a large fraction of a diseased cohort. Once a result set has been published, it can be used as input for analysis tools, or to derive new sets by combining it with other published or private data.
Correll foresees more collaborative, Web-based resources coming down the development pipeline in the near future. "I think it is clear that we are currently at an inflection point, where changing economics are democratizing access to genomic tools and information," he says. "In short, genomics is going mainstream, and what we really need are more tools that can help bridge the divide between the traditional power users and our new stakeholders — the clinicians, the individuals, the non-informaticians."