One of the computational projects of a research pact between the University of Birmingham, BGI, and its open access journal GigaScience has been funded by the UK's Natural Environment Research Council.
The six-month project, which is funded by NERC's Mathematics and Informatics for Environmental Omic Data Synthesis program, aims to explore the possibility of extending Galaxy — an open web-based sequence analysis platform developed by researchers at Pennsylvania State and Emory Universities — to include tools for metabolomics data analysis.
Although Galaxy is a popular analysis tool in areas such as genomics and proteomics, "we did notice is there wasn’t much use of Galaxy in metabolomics," Peter Li, GigaScience's data organization manager, told BioInform.
"[We] thought there was a gap there which could be filled by … tailoring and developing Galaxy for use in the metabolomics domain," he said.
Specifically, the partners will use the NERC funds to embed Matlab-based metabolomics data processing scripts and statistical analysis scripts developed by researchers at the University of Birmingham into the Galaxy workflow system, as well as to purchase a server that will be installed at the university and be used to host and run the software.
In a statement, Mark Viant, a professor of metabolomics from the University of Birmingham, said that the collaboration with BGI "aligns perfectly with one of our major goals at Birmingham," which is "to develop tools and resources to facilitate the wider use of metabolomics by environmental scientists, and subsequently to provide training in these tools."
Researchers at the university will use the platform to explore the toxicological responses of organisms to environmental pollutants by analyzing large-scale metabolomics data. They are currently studying the metabolic responses of the freshwater model organism Daphnia to both pollutants and engineered nanomaterials.
The partners also plan to make their pipelines available to the environmental science community to facilitate broader use of metabolomics data, which to date hasn’t enjoyed as much growth as some other omics branches, according to representatives from both BGI and University of Birmingham.
University of Birmingham's Viant noted that while other life science disciplines such as genomics and transcriptomics have flourished, metabolomics remains a "relatively small player" in the space due at least in part to the fact that current analysis tools aren't easy for biologists to use.
"Commercial workflows are arguably not yet as good as many of the workflows developed in academic labs" but on the other hand "software developed in academic labs isn't so professional and can be clunky, harder to use, [and] may need experience in some programming languages," he told BioInform. This creates an analysis bottleneck that in a sense has "stifled" the growth of the field, he said.
He said that his lab has developed and optimized several data processing pipelines and statistical analysis tools for metabolomics over the last 10 years that they believe can benefit researchers in the community. These include applications for normalizing mass spectrometry data, managing variants, and imputing missing values.
However, since these tools were written in the Matlab programming language, interested researchers would need to have some coding experience before they could use them, Viant said.
That’s why his group is working with BGI's GigaScience to embed the scripts in Galaxy. "We are just trying to … put more user-friendly tools in place to make it more approachable" and more generally "to enable the [metabolomics] discipline to grow," he said.