PALM SPRINGS, California – Combining various types of omics datasets can give investigators a fuller picture of what is going on in biological samples.
This multi-omic approach can be applied to a range of studies, including to better understand the development of the human gut microbiome, predict outcomes among stable ischemic heart disease patients, and give insight into personal health, speakers at this year's Association of Biomolecular Resource Facilities meeting said. However, they noted that the approach isn't without challenges.
"Multiple levels of data can really help provide a lot of insights, disease mechanisms of biology, that one data type alone cannot provide," Kelly Ruggles from New York University Langone Health said during the session.
Oak Ridge National Laboratory's Robert Hettich, for example, has been combining omics tools to study the proteins produced by the human gut microbiome. In particular, he and his colleagues have been examining the fecal metaproteome of 94 preterm infants over time using LC-MS/MS. They are studying this group because when infants are born they lack a gut microbiome, but they begin to develop it within a week or two of birth.
The makeup of the infants' gut microbiomes varied over time, but also from one another, including those of two siblings. But many proteins expressed by the infants' fecal microbiomes were involved in similar metabolic pathways, suggesting there is conservation at the functional level.
Meanwhile, Ruggles, as part of the ISCHEMIA trial at NYU is integrating omics data to identify a molecular signature to help manage patients with stable ischemic heart disease. The broader trial has enrolled more than 5,000 patients but has also developed a repository of whole blood DNA, RNA, plasma, and serum samples from about 1,000 patients that can be used to amass a range of omics datasets. So far, she said they've generated methylation array and RNA-seq data and hope to get targeted proteomics and metabolomics data soon.
Stable ischemic heart disease patients currently are stratified based on their clinical features, but the study researchers hypothesize that molecular markers identified in this way might better predict who is at risk for having a heart attack or dying. "If we find people who are most at risk, then we can really target intensive therapies at the right cohort of people," she added.
Other researchers like Stanford University's Tejaswini Mishra, who is in Michael Snyder's lab there, are adding in additional layers of data. The Snyder lab, which in 2012 published a personal omics profile of Synder, has been collecting not only genomic, proteomic, and metabolomic data, but also data from wearables and clinical tests. They are now following more than 100 people over time, gathering a wide range of data.
The hope there, Mishra said, is to generate a picture of baseline health and see how it changes when someone is sick. "Longitudinal profiles are going to be very valuable for understanding personal disease states," she added.
New approaches may also be needed to grapple with these datasets. NYU's Ruggles and her colleagues, for instance, developed a tool called BlackSheep for outlier analysis, which she describes as a sort of alternative to differential expression analysis for large cohorts for which there's lots of omics data. They likewise are working on a tool called PhosphoDisco to collapse phosphorylated sites from mass spec data into co-regulated modules and then connect those to the activity of various kinases or clinical variables.
Similarly, the National Institutes of Health's Ewy Mathé built a database called Relational database of Metabolomics Pathways (RaMP) that incorporates biological pathways from the Kyoto Encyclopedia of Genes and Genomes, Reactome, WikiPathways, and the Human Metabolome DataBase datasets to enable pathway-level analyses.
But there remain a number of unknowns, particularly among proteomic and metabolic datasets. Oak Ridge's Hettich noted that a large portion of proteins are not annotated. "There are unknown proteins that are very specifically related to health or disease, but they don't have a name," he said. "And so, by and large, most people who do metabolic mapping throw them in the garbage can, which is a great concern to me."
Mathé added that there is also the issue of standards, as most standards are currently data type-specific. "But then when we're trying to actually throw this data together, it ends up requiring a lot of legwork to be able to make it work and to make things to be compatible," she said.
She is part of two efforts aimed at addressing some of those issues. One from the Software Data Exchange (SODA) that is part of the Metabolomics Association of North America that is developing a list of maintained software and of test datasets that can be used to benchmark software. Another is through Consortium of Metabolomic Studies (COMETS) Analytics that aims to standardize, for instance, how investigators code their variables or name their metabolites, and embed those standards into the consortia-level software researchers are using.