CHICAGO (GenomeWeb) – The field of metabolomic informatics got a little more interesting in the second half of 2018, but it's still rather fragmented.
There have historically been three distinct databases for metabolomics, according to Gary Patti, who runs a metabolomics research laboratory at Washington University in St. Louis.
The European Bioinformatics Institute defines metabolomics as the the study of metabolites in cells, tissues, or organisms. It's a complex field, in some ways more mature than even genomics, since 5 million newborns in North America get screened for various metabolomic conditions each year, according to David Wishart, director of Canada's Metabolomics Innovation Centre.
"Humans are pretty complicated metabolically, so they have a metabolism in yeast and different ones from plants as well," Wishart said. This makes metabolomics a great candidate for informatics technologies.
"Trying to have some detail about where the compound's from, what it does in the body, how it functions, its enzymatic reactions from any receptors that might bind pathways that it participates [in], that's the kind of information we try and put into our databases," he said.
Perhaps the best-known metabolomic database, METLIN, started at Scripps Research Institute in La Jolla, California, in 2003 and went online for the public two years later, according to Gary Siuzdak, senior director of the Scripps Center for Metabolomics. Companion analytics software, called XCMS, to process mass spectral metabolite data, was created in 2006 and came online in 2012.
Oliver Fiehn, who directs the National Institutes of Health-funded West Coast Metabolomics Center in the University of California, Davis, Genome Center, created another database, called FiehnLib. That fully open-source repository is affiliated with the Metabolomics Workbench, which is the data-coordination piece of the NIH Common Fund's Metabolomics Program, hosted at UC-San Diego.
The third major database is the Human Metabolome Database, created at the Metabolomics Innovation Centre at the University of Alberta in Edmonton. That dates to the origin of the Human Metabolome Project in 2005.
There are some deep divisions between the three, some due to competitive pressures and others because of the nature of metabolomics.
"There will always be untargeted and there always be targeted metabolomics, so that means there are fundamentally going to be two camps, two software methods," Wishart explained.
Wishart also noted that the instruments used in nuclear magnetic resonance are completely different than those for mass spectrometry, and the latter breaks down into liquid and gas chromatography. "You can't use the same software tools for the three instruments," Wishart said. "At a minimum, it's is going to be three times two, targeted times untargeted, three platforms, so at least six different types of software tools."
He added that those who have built databases have put in countless hours, so they become attached to their hard work. "It would be pretty easy for someone to just screen-scrape all these databases and consolidate them into one resource. So far no one's done it, partly because I think it would be like copying someone's books and saying, 'Here's my bigger book,'" Wishart said.
"The ethos in science is that if someone who has spent years of their life putting a database together, they largely get the credit for it and maintain it."
The technology hasn't been all that effective, either. In a 2015 commentary in the Proceedings of the National Academy of Sciences, a team from UCSD found that the average untargeted metabolomics experiment only identified 1.8 percent of spectra from the NIH PubChem database.
While gene ontology has helped unify genomic informatics to an extent, Wishart said, metabolomics is a significantly smaller field, with maybe 3,000 to 4,000 people worldwide, he estimated.
"I think if more people enter into metabolomics, they'd like some sort of one-stop shop," Wishart said. "I think they would like to have single-source databases to do their spectral searching. I think they would like to have single-source analytical tools," he said.
Some of the established players are trying to position themselves as just that, based on two developments in recent months.
In August, NIH awarded $12 million to a UCSD team led by Shankar Subramaniam to continue development on the Metabolomics Workbench, a repository of metabolomics data, metadata, and other resources intended for research use and, eventually, clinical applications.
This new funding will allow the researchers expand the Workbench to include a wide range of clinical data, including demographic information for patients and participants in clinical trials. They will also collect information on study size, the randomization process, nature and duration of interventions, and other critical information.
It builds on a grant that was part of the NIH Common Fund Metabolomics Program, which was launched in 2012 to facilitate metabolomics-focused biomedical research in the US. Over the last six-plus years, Subramaniam and his team have gathered data from more than 1,000 metabolomics studies for the Workbench.
The Workbench, which is housed in the cloud at the San Diego Supercomputer Center, contains more than 50,000 experimentally annotated metabolites along with upwards of 1 million computationally generated metabolites described in terms of their structure, classification, and computed spectra.
Siuzdak said that Subramaniam received the funding because the UCSD bioengineer is trying to make metabolomic data more accessible. However, Siuzdak, whose Scripps Center for Metabolomics is practically across the street from Subramaniam's lab, said that his work takes the UCSD efforts "a significant step further."
Siuzdak was co-corresponding author of a paper that appeared in Nature Methods last summer describing an advance to XCMS-METLIN, namely software designed to move those tools into targeted analysis.
The paper discussed and coincided with the public release of XCMS-MRM and METLIN-MRM, cloud-based platforms for researchers to make their analysis of multiple-reaction monitoring publicly accessible to other researchers.
"XCMS-MRM was developed out of a need for a single MRM platform that could be accessed from anywhere around the world," Siuzdak said. This, he explained, will help grow the communities of XCMS and MRM users. "METLIN-MRM was developed [because] nothing even close to this type of resource is available," Siuzdak said.
XCMS now has registered 25,000 users and METLIN close to 30,000, according to Siuzdak. METLIN's database has MS-MS data "in both positive and negative ionization modes at multiple collision energies on well over 200,000 molecular standards," he said. That is up from 15,000 when the MRM project started.
With the XCMS user base, Siuzdak reported having about 300,000 instances of data sharing since that system went online. "We see that possibility of data sharing becoming much more ubiquitous and making it accessible to a much larger population of people with this XCMS-MRM and METLIN-MRM," he said.
The user community can upload their own molecular transitions, then a curation team reviews these submissions to make sure that the data is in a standardized format, the Nature Methods paper explained.
The plan is to grow METLIN to 500,000 molecule samples, a process that could take two years at the current addition rate of 5,000 of 10,000 per week. "We don't have the funding to ramp it up even faster," Siuzdak said.
What Scripps does have is access to the same San Diego Supercomputer Center that hosts Metabolomics Workbench. "One of the central limitations with data processing is that it's a lot of data and Scripps is sort of uniquely configured to be able to handle these very, very large, data-intensive jobs," said Wishart.
"The alternative is usually to get commercial software, which runs your laptop red-hot because it's crunching away for so long," he said. Some mass spectrometers have this kind of setup, he noted, but they are unable to handle the kind of volume XCMS-METLIN offer.
"I think XCMS-METLIN is getting to be pretty comprehensive, one-size-fits-all," Wishart continued.
But it is far from a metabolomics panacea.
Notably, Scripps had to self-finance much of the METLIN and XCMS work, so the data is not completely public. The METLIN database of mass spectra is licensed and sold through commercial publisher John Wiley & Sons, though Siuzdak said the Wiley collection has not been updated in more than a year and is far smaller than the freely available online version that Scripps maintains.
The Human Metabolome Database and FiehnLib were publicly supported, so the Metabolomics Innovation Centre and the West Coast Metabolomics Center do make their repositories and corresponding software fully open-source.
This is the point of some controversy. "We will not use that tool," Fiehn said of the new XCMS-METLIN technology.
"We do not see benefits for a MRM repository, as this is merely a collection of possible MS-MS transitions, not optimized transitions (for specific instruments) and not driven by validated LC methods," Fiehn added in an email to GenomeWeb. Fiehn said he is working with Subramaniam on further development of the Metabolomics Workbench "as national repository of metabolomics data."
Wishart said that those in the field will continue to use multiple databases and technology, including MetaboAnalyst, a statistical data-reduction software package he developed at the University of Alberta. "Typically, people will use a combination of XCMS, METLIN, HMDB, and MetaboAnalyst to figure everything out," he said.