Staunch metabolomicists contend that a complete map of all small-molecule metabolites in the human body would bring us much closer to where expectations have been set for personalized medicine in the 21st century than any advance in genomics or proteomics might allow. The challenge is that for all its predicted usefulness in a clinical setting, metabolomics is an exponentially more difficult problem to tackle — some researchers say that it's still not exactly clear what constitutes a complete picture of all human metabolites. One complete metabolome varies greatly throughout an organism's lifetime, and even over the course of a single day.
"A challenge that faces metabolomics is the fact that there is no clear definition of the metabolome, and linking changes in metabolite levels to meaningful biological interpretation is difficult," says Lorraine Brennan, a lecturer in nutritional biochemistry at University College Dublin. She does see some light at the end of the tunnel in the form of certain tools such as the nuGO wiki, which houses the Nutritional Metabolomics Database, as well as visualization tools like PathVisio, which works with many different types of data on familiar biological pathways — but there is still much to be done.
Brennan and her colleagues published a paper in BMC Bioinformatics last November where they introduced a software solution called MetaFIND. "Metabolomics data is highly correlated, and MetaFIND was developed to aid researchers to assess correlation within the data and thus aid in identification of metabolites," Brennan says. "It is an easy-to-use, open access software that can be used to enhance your favorite feature selection technique." Commonly used feature selection methods often fall short of capturing the full set of relevant features in the case of high-dimensional, multi-collinear metabolomics data.
This kind of flexibility in a metabolomics software package is a top priority for researchers since the data comes in a wide variety of flavors. "Metabolomics is coming from very different platforms. All the analytical chemistry can be applied from the last 200 years, and there's no standard of what is good data and what is not publishable data, because editors tend to publish everything," says Oliver Fiehn, an associate professor in molecular and cellular biology at the University of California, Davis. "The problem is that these machines are not integrated and you don't have all the benefits of all the machines in one type of instrument."
Fiehn has been in the trenches of developing databases and computational tools for metabolomics for more than a decade, but says he is still very much where he started. "The idea of metabolomics is to look at the metabolome and while we can do it easily for compounds that are known, for those novel compounds that might be even conjugates of two known compounds, it is much, much harder," he says.
Last year, Fiehn's lab contributed to a metabolomics study of 17 different white wines conducted by Kirsten Skogerson. For this study, Skogerson analyzed all of her mass spec data using BinBase, a freeware database application with an underlying algorithm that removes noise peaks and inconsistent signals that was developed in Fiehn's lab. While many researchers have churned out a bevy of homegrown computational solutions like BinBase, Fiehn says there is still a huge need for the metabolomics community to continue developing novel algorithms to help identify unknown compounds. "We have published a couple of algorithms, and yes, we move forward, but it's a slow process because it's scientifically difficult," he says. "All other databases just don't have the capability to combine and query across studies, as other databases do simple alignment, which means it works for one study, but you cannot compare the results to another study. … Alignment is not the way to go, in my opinion."
At the Center for Biotechnology at Bielefeld University, Heiko Neuweger and his colleagues recently released MeltDB, an open source framework intended to provide comparative analysis methods for raw mass spec datasets. A tool pipeline allows both the import of preprocessed data as well as the integration of existing open source analysis packages such as XCMS, MassSpecWavelet, or -metaB. But what makes MeltDB unique is that it allows researchers to expand analysis across a cluster at the university. "There are several tools available, but we had the idea to connect the actual computation to a compute cluster, and this is something that we did not yet see in any other tools," says Neuweger. "This was the main idea, to make it platform independent, make it a Web browser-based application, and then do the complete computation in the back."
Neuweger says that the metabolomics community should definitely step up its efforts in algorithm development. This was partly why MeltDB was made to be such an open platform: to increase data sharing and algorithm development across the community. "Of course, new algorithms or better algorithms could clearly benefit the way the data is analyzed," says Neuweger. "That was the idea with MeltDB … that we construct a platform that makes it easy to try out new algorithms and integrate other existing tools."
Though visualization software tools have been the application du jour in the genomics community, Gary Siuzdak, senior director at the Scripps Center for Mass Spectrometry, says that this kind of analysis is not the silver bullet for metabolomics. "I think the visualization tools are actually great, but unfortunately they are not telling you a whole lot about what's going on biochemically," he says. "If your program can give you a good statistical evaluation of which molecules are changing significantly, that basically gives the start so you can tell what molecule is worth identifying."
Databases
As far as data repositories are concerned, this younger cousin of proteomics and genomics is understandably lagging. "It's really at the very early stages of putting these databases together. There's at least on the order of tens of thousands of these molecules of endogenous metabolites," says Siuzdak. "I think the most frustrating part of the databases right now is not their usability, it's that it's very common for us to see a molecule that's very interesting and has great statistics — P values of less than 105 — but ultimately we're not able to identify it because it's totally unique."
Still, significant efforts have been made on the small molecule database front, such as the Siuzdak lab's mass spectra metabolite database, METLIN, as well as the Human Metabolome Database, hosted by the University of Alberta. But they have a long way to go in terms of approaching a comprehensive set of human metabolites. "Both of our databases agree that the number of these metabolites which we've identified is only on the order of about 2,500, not to mention that you have to consider exogenous molecules as well," Siuzdak says. "Our lab … had a paper recently come out where we had spent over nine months identifying one molecule — so that's the real limitation right now in terms of metabolomics: the vast majority of molecules out there that we're observing, we have no idea what they are."
There is also a need for appropriate text mining tools, Fiehn says. "For example, if a researcher is working with five unidentified metabolites in rice, the first question is how many compounds are known in rice. And that is already a question that is hard because you cannot infer it from the genome itself because many, many more small molecules are known than we have known enzymes converting them, so you would have tons of compounds without enzymes attached to them," he says. "So basically what you have to do in metabolomics is go back to literature, and there are, say for rice, 50,000 papers published on rice with small molecules with metabolites. But these are not available to the research in database format — they are available one by one by one."