Skip to main content
Premium Trial:

Request an Annual Quote

Researchers Explore Potential Solutions, Underlying Causes of Metabolomics' Identification Difficulties


NEW YORK (GenomeWeb) – As a field, mass spec-based metabolomics has grown rapidly in recent years, but identification of molecules detected in such experiments remains a considerable challenge.

For instance, in a recent commentary in the Proceedings of the National Academy of Sciences, a group of University of California, San Diego researchers noted that in the average untargeted metabolomics experiment only 1.8 percent of spectra is identified.

The challenge is similar to that faced in proteomics experiments, in which researchers try to match spectra to peptides. But while proteomics deals with a relatively limited number of molecules — 22 amino acids and some 20,000 predicted protein-coding genes in humans — metabolomics experiments must sort through a massive diversity of chemical structures.

As the UCSD researchers noted, there are more than 60 million molecules in the PubChem database, "but only 220,000 MS/MS spectra representing about 20,000 molecules that are accessible for untargeted metabolomic experiments."

The UCSD commentary addressed a separate PNAS paper published this month in which a team led by researchers from Friedrich Schiller University put forth a new computational approach aimed at improving metabolomic identifications. The method, which they named CSI:FingerID, uses a combination of fragmentation tree computation and machine learning to help identify metabolites detected by mass spec.

In the approach, the researchers use a mass spec database of reference molecules to train an algorithm to compute fragment trees that explain the spectra of unknown molecules and build a molecular fingerprint of the molecule based on the molecular properties of known structures.

They then train a Support Vector Machine classifier to, for each of the compound's molecular properties, sort molecule structures into one set containing these properties and another set that doesn't. They can then take the spectra of the unknown compound and determine its similarity to the various compounds in their reference database and score the fingerprint of these compounds against the fingerprint generated for the unknown compound to make an identification.

Applying the approach to the analysis of two large-scale metabolomics databases, the researchers found they could make 150 percent more correction identifications than the second-best search method, and they made 5.4-fold more unique identifications. They noted, as well, that they expect the method will improve with additional training data.

In another paper published this month, this one in Analytical Chemistry, a different group, led by scientists at the Scripps Research Institute in La Jolla, California, also took aim at the challenge of making identifications in untargeted metabolomics, offering not a new solution but highlighting a potential cause.

In this work, the researchers aimed to demonstrate the degree of degradation caused by the heating involved in gas chromatography, which is commonly used upfront of mass spec in metabolomic experiments.

Depending on the experiment, gas chromatography cycles can involve heating samples up to 300oC or more for extended periods of time. As the Scripps team demonstrated in their study, this can cause extensive degradation of molecules, creating additional products beyond those originally introduced into the GC-MS system and further complicating the already challenging process of making metabolite IDs.

In the paper, the researchers heated both underivatized and derivatized samples (derivatization is used to modify non-volatile compounds so that they can be analyzed by GC) at three different temperatures (60, 100, and 250oC) and three different exposure times (30, 60, and 300 seconds), then ran them on LC-MS to identify changes in the molecules due to heating.

The analysis identified significant effects from this heating, finding that more than 40 percent of the peaks in a plasma sample were changed after heating at 250 oC for 300 seconds. Looking at a set of 64 small molecule standards, they found that most showed the effects of degradation even at the 30 second time point.

The results indicate that "a lot of the things we are observing are likely coming from thermal degradation as opposed to the original sample," Scripps researcher Gary Siuzdak, senior author on the paper, told GenomeWeb.

"I think this allows people to look at the results coming out of a GC-MS and say, we have to make quite sure that what we are seeing is not just a degradation product but [actually] a real molecule associated with the biological or chemical system we are interested in," he said.

In one sense, the extent of degradation observed was surprising, said Siuzdak, whose lab relies more on LC than GC in its metabolomics work. "Although," he noted, "in retrospect, how surprised can we be to learn that if we are heating things up to 250 oC or higher temperatures causes significant degradation and molecular transformation?"

Indeed, several metabolomics researchers suggested that while the Analytical Chemistry paper was valuable in that it highlighted an issue that receives relatively little discussion in metabolomics circles, its findings were not particularly surprising, and, in many cases, already familiar to frequents users of GC-MS.

"Especially in GC-MS, one of the bottlenecks [in metabolomics] is that you can identify very confidently a lot of the main metabolites, but there are so many unknown compounds which are present that we just pick to ignore," said Daniel Dias, a researcher at RMIT University and formerly a fellow at Metabolomics Australia. "Are these unknown compounds true biological compounds? Or are they degradation projects? It is something that anyone in the field of metabolomics needs to take into consideration and really have an understanding of the temperatures, extraction methods, and the fundamental laboratory practices involved."

Dias suggested, however, that most GC-MS metabolomics researcher are well versed in these issues.

"A lot of the compounds which they have identified are sort of common knowledge, stuff that we are aware of," he said. "It was, like, 'Yep, I know about this, and I think a lot of people know about this.'"

Dias gave the paper credit, though, for raising a matter that he said was little discussed among researchers, even if reasonably well understood.

Dan Bearden, a research chemist with the National Institutes of Standards and Technology, said the paper was a good example of the sort of work looking at sources of systematic variation in metabolomics experiments that the field should be undertaking.

"I think it's a healthy approach to take," he said.

However, he questioned the extent to which the experimental model, in which the researchers heated various vials of samples and then ran them on LC-MS actually reflected a true GC-MS experiment.

Oliver Fiehn, director of the West Coast Metabolomics Center at UC Davis, was less equivocal. In an email to GenomeWeb, he called the study design "seriously flawed," noting several steps taken by the Scripps team that were uncharacteristic of a GC-MS experiment, including heating of underivatized plasma and the use of water-based solvents with derivatized samples, which, he said, causes immediate degradation of amino and sulfur compounds.

Additionally, Fiehn said, unlike the conditions used in the Analytical Chemistry paper, in which all molecules in a sample were subjected to the same level of heat and then analyzed via LC-MS, in a GC-MS experiment, molecules are subjected only to the amount of heat required to bring them to their boiling point, upon which they enter the gas phase.

"If a compound has a boiling point of 100oC, it will not face a temperature of 200oC," he said. "Other compounds may have a boiling point of 200oC and, again, would not face temperature of 300oC."

More generally, Fiehn said, the paper ignored the fact that the GC-MS community has done extensive validation work looking at the behavior of compounds under different experimental conditions.

For instance, he said, "we have in the last 10 years run about 200,000 samples under identical conditions, for thousands of experiments, and have a database of GC-MS peaks that looks at currently 6,500 unique compounds."

Given the challenges involved, a certain conservatism regarding the technique's capabilities is also required, Fiehn said. "We are careful not to promise, for example, 1,000 identified compounds, so that we are sure to deliver. But we always find hundreds of compounds correctly, and reproducibly, identified."