Aiming to provide a method for integrating and sharing scientific research information, an international group of researchers from more 30 organizations has agreed to use a common standard to capture experimental metadata from a variety of research fields.
The team, led by researchers from Oxford and Harvard universities, described the standard, dubbed the Investigation-Study-Assay tools framework, or ISA, in a commentary published last week in Nature Genetics.
According to the paper, ISA is the "backbone" on which "discovery, exchange, and informed integration of data sets articulate with one another."
"There are hundreds of new technologies coming along but also many ways to describe the information produced," Susanna-Assunta Sansone, who leads the project at the University of Oxford, said in a statement.
With ISA, "we can take a jigsaw puzzle of different sciences and now fit the many pieces together to form a complete picture," she said.
The framework builds on the ISA software suite, comprised of five open source software programs, for curating experimental metadata using standard ontologies. A paper describing the initial toolset and format was published in 2010 in Bioinformatics (BI 9/3/2010).
ISA tools include the ISA-Tab file format, which is an extensible, hierarchical structure for describing experimental metadata including sample characteristics, technology and measurement types, and sample-to-data relationships.
The tools also include the ISA software suite, which lets researchers create, edit, store, and serve ISA-Tab files so they can use community-defined minimum information checklists and ontologies. The software also converts ISA-Tab files to related formats for submission to a number of international public repositories including the European Nucleotide Archive, the Proteomics Identifications Database, and ArrayExpress.
A Single Framework
In the Nature Genetics paper, the authors state that the while minimum reporting guidelines and standardized terminologies and formats have made it possible to structure, curate, and share data within single scientific disciplines, "the mountain of frameworks" needed to support data sharing between these communities has hampered the development of tools for broad data management, reuse, and integration.
In particular, information from studies involving multiple kinds of analyses — such as sequencing, protein-protein interaction assays, and measuring metabolite concentrations — is "challenging to share as coherent units of research" because of the "diversity of reporting standards with which the parts must be formally represented," the authors explain.
The paper acknowledges that there are ongoing efforts to bring data together into references, but notes that this is limited to large initiatives such as the Sage Bionetworks Commons and the Encyclopedia of DNA Elements projects, which are equipped to "navigate the various reporting guidelines, terminologies, and formats" necessary to bridge different domains.
With ISA tools, researchers have a framework that enables "coordinated use of reporting standards by service providers" within an open common data environment and at the same time "circumvents many of the problems caused by data diversity" since it captures metadata about the experiments, the authors explained.
ISA lets users capture information that can be generalized across different experiments such as the sample type, characteristics of where the sample was taken from, assays that were done, and subsequent analysis, and then structures it in the ISA-Tab file format, Sansone explained.
Users can then extend the framework by extracting domain-specific information from the files and creating databases. For example, the system could be used to capture and structure information on experiments about metabolites and then create a database that researchers can query for information on metabolomic experiments.
Sansone told BioInform this week that since the 2010 Bioinformatics publication, the suite has been implemented in a number of scientific domains, the ISA-Tab format is now used in software tools other than the ISA suite, and the ISA user community has grown to include developers who contribute code to the existing infrastructure so that new modules can be added.
ISA's developers and proponents have also established an online presence, dubbed the ISA Commons, through which they provide support and forums for discussing the ISA-Tab file format and other ISA components.
Sansone explained that the website was set up so that users can describe how they use the format in their domains as well as to illustrate the "flexibility" of the format for annotating experiments.
It has become an open framework "that belongs to the community who is using it" and who can help newcomers get set up, she said, adding that her team is on hand to provide support for the initial adoption process. Such support includes bug fixes, responses to new feature requests, demonstrations of how the system works, and adjusting the system to meet researcher requirements.
Sansone said her group hasn’t had to do much evangelizing because new users find out about the system through publications that reference the standard and from discussions with collaborators who have come across the system.
They "come to us because there is so much need out there," she said.
Additionally, the website has fostered a community of researchers who are developing data-curating and -sharing tools that cover multiple domains including environmental health, environmental genomics, metabolomics, proteomics, systems biology, transcriptomics, and toxicogenomics, the researchers wrote in the Nature Genetics paper.
ISA at Work
One example of how ISA makes connections between unrelated experiments is at the Harvard Stem Cell Institute, where researchers use the system to find relationships between various experiments, ranging from studies of normal blood stem cells in fish to cancers in children, according to Winston Hide, director of the institute's Center for Stem Cell Bioinformatics and an associate professor of bioinformatics at the Harvard School of Public Health.
Hide, an early adopter of the ISA framework and a co-author on the Nature Genetics paper, told BioInform that he began using the system to help researchers involved in environmental health projects at Harvard's public health school find similar experiments to theirs. His team then implemented the system in the stem cell institute where it is used in the group's stem cell discovery engine — an online database of curated cancer stem cell experiments that are coupled to the Galaxy analytical framework.
Another Harvard project, the Library of Integrated Network-based Cellular Signatures, is also using ISA. Led by Peter Sorger and Timothy Mitchison, that effort aims to create libraries of signatures that describe how cells respond to perturbation.
Meanwhile, GigaScience, an open access journal published by BioMed Central and BGI Shenzhen, is using the framework to "harmonize" and present large datasets in a standardized and usable format prior to publishing this information in the journal, Scott Edmunds, the journal's editor, said in a statement.
The ISA group is now looking to generate interest among pharmaceutical, chemical, and agricultural companies and has tapped UK-based ConnectedDiscovery to facilitate these collaborations, Sansone said.
ConnectedDiscovery focuses on precompetitive collaboration and knowledge management, Bryn Williams-Jones, the company’s founder and chief operating officer, told BioInform.
Williams-Jones, who previously worked at Pfizer, explained that the company was "keen" to support the ISA Commons project because pharma has "struggled for a long time" trying to find ways to manage and use large public datasets.
In its work with ISA, the company is "trying to get some of the right people in touch with the project so that we can make sure the data is visible and make sure they are aware of what's going on and make best use of the services that are already there," he said.
Additionally, groups like the Functional Genomics Data Society have showed interest in the system, as have funding agencies such as the National Institutes of Health's National Institute of Environmental Health Sciences and the UK's Biotechnology and Biological Sciences Research Council, Harvard's Hide told BioInform.
ISA has some "pretty heavy parties behind it that think it's worthwhile and are actively encouraging individuals within their funding cycles to consider this as an approach moving forward," he said, though he warned that it's still too early to claim ISA as a universal standard for describing experiments.
In the Nature Genetics paper, the researchers write that they are interested in working with ongoing complementary efforts that are attempting to standardize methods of sharing and integrating data.
"We realize that we are not solving all the problems. This is really just a drop," Sansone said. But "we are showing that something can be done or at least that the focus should be put on ... ideally the experimental metadata ... and this seems to work."
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.