Researchers from the Broad Institute are seeking participants to test-drive a new software environment, called GenomeSpace, that lets researchers move their data between multiple genomic analysis tools regardless of the data's format.
Jill Mesirov, the director of computational biology and bioinformatics at the Broad Institute, announced the platform's availability during her keynote address at the Bio-IT World conference held in Boston last week.
GenomeSpace is currently in beta and supports six "seed" tools: GenePattern, Galaxy, the Integrative Genomics Viewer, Cytoscape, Genomica, and the UCSC Genome Browser.
In addition, the platform offers links to data sources such as InSilico DB, which contains thousands of curated public datasets that can be exported to analysis tools.
During her presentation, Mesirov invited biomedical researchers and tool developers to sign up to use the resource in their projects and offer feedback on its utility.
The developers hope that in addition to putting the environment through its paces, the beta will generate new data format converters that will improve its interoperability and support the addition of more analysis tools into the GenomeSpace environment, Mesirov told BioInform this week.
Michael Reich, the director of informatics development for the Broad Institute's cancer program, described GenomeSpace as a connection layer that allows different tools to communicate. He said in a statement that the resource "acts as a broker, automatically detecting and converting files from one format to another for the user."
GenomeSpace connects these tools by means of two different application programming interfaces — a Java client development kit and a RESTful API — depending on the architecture of the tool in question, Mesirov explained to BioInform this week.
She said the six software packages selected for the initial launch were chosen because they represented “very popular genomic/bioinformatics tools” and “different types of architectures,” enabling GenomeSpace’s developers to make an interoperable system that was both “lightweight” and presented a “low barrier to entry” for both the tools and users.
According to the development team, GenomeSpace addresses an oft-encountered challenge for researchers who require separate tools at different stages in their projects.
Currently, for researchers to use multiple analysis tools and data sources, they need to convert between the different data formats they use — a process which often involves error-prone spreadsheet manipulations or requires programming skills to write scripts.
"We strove to identify a range of critical biological problems — from 'microproblems' involving a couple of steps in two tools, to complex scenarios on the scale of an extensive research paper," Aviv Regev, a faculty member of the Broad Institute, an associate professor at MIT, and a scientist at the Howard Hughes Medical Institute, said in a statement
For instance, if researchers want to test a hypothesis about genetic differences between two stages of breast cancer, they might first use an analytical tool such as GenePattern to detect genes of interest; then IGV to view their genetic sequence; and then Cytoscape to see protein-protein interactions.
Using GenomeSpace would allow them to seamlessly transition between all of these tools to carry a project through to completion, the Broad team said.
Additionally, the resource could be used for smaller inquiries or simple conversions from one tool to another, the team said.
"GenomeSpace will empower biologists with no computational or programming background to maximize their ability to weave together biological insight with best-in-class computational tools," Regev said. "We hope it will make analyses accessible that were beyond the reach of many biologists."
Researchers can use the system by logging into GenomeSpace and selecting any of the six seed tools currently on offer, Mesirov said.
Alternatively, users can connect with GenomeSpace from any one of the six packages already linked to the platform, and send data to it for further analysis with other tools in the environment, she said.
GenomeSpace lets users upload, download, and manage files and directories in a cloud infrastructure provided by Amazon Web Services.
Reich told BioInform that each GenomeSpace account holder will receive some storage space in the cloud, although the team is still deciding how much space will be allotted to each user.
Mesirov said the group intends to be “flexible” in the amount of storage each user receives during the beta, although it won't be able to support groups that have very large quantities of data just yet.
"If somebody wants to store a terabyte of data, we can't do that right now," she said, adding that the group will be monitoring usage patterns of new registrants to make sure it can support them fully.
"The development team is fairly small and so we want to make sure that the initial registrants get a good amount of support," she said.
Reich added that the development team plans to expand the resource so that users with existing Amazon accounts can link their own storage infrastructure to GenomeSpace.
Additionally, the team is investigating some of the approaches scientists currently use to move and store their datasets in the cloud such as the newly launched Google Drive and Dropbox with an eye toward incorporating some of these capabilities into GenomeSpace, he said.
Following their presentations at Bio-IT World, both Mesirov and Reich said the response from researchers was “positive,” with some wanting to learn to use the resource, while others who have developed similar infrastructures shared their best practices.
A third group wanted to know if the platform could replace commercial tools, Reich said, adding that it is a possibility and one that the group is considering exploring further.
In addition to new tools, the Broad team is also encouraging users to submit data-format converters that will be made available to other users in the community.
The developers also plan to post examples of both small- and large-scale “driving biological projects” that showcase how the system handles analyses that require different tools, Mesirov said.
"We realize that there are a large number of people out there that may not know the tools but may have a strong scientific reason to want to use them, so we are publishing these small recipes for doing various things that teach you just enough of the tool that you need to know to get those analysis done," Reich explained.
Furthermore, GenomeSpace is linked to each analysis tool’s website, so users have access to whatever training documentation the developers of those tools provide, she said.
The group doesn’t have a fixed launch date for GenomeSpace, but Reich said that users can expect to see more functionality added that will make it “better and better over time.”
He also said that the group has received requests to develop an instance of GenomeSpace that is not linked to the Amazon cloud and that it plans to develop one at some point.
He pointed out, however, that researchers can already obtain the open source platform from the Broad's bitbucket repository and install their own version of the resource internally.