Skip to main content
Premium Trial:

Request an Annual Quote

New Method Aims for Species-level Resolution of Metagenome Data

NEW YORK (GenomeWeb News) – With a new approach, researchers led by the European Molecular Biology Laboratory's Peer Bork captured species-level abundance data of known and unknown species from metagenomic samples, as they reported in Nature Methods yesterday.

Often 16S ribosomal RNA gene amplification and sequencing is used to classify which prokaryotes are present in an environmental sample. Alternatively, shotgun metagenomic sequencing can also determine what species are present by aligning the short reads generated to reference genomes.

Both approaches, though, have their drawbacks, Bork and his colleagues noted. The 16S rDNA approach is fraught with biases brought on by the presence of copy-number variants, differences in amplification efficiencies, and more, while shotgun sequencing-based identification is reliant upon the availability of a reference genome. Either way, many species may go uncatalogued.

Their new approach, they said, can resolve taxa without sequenced genomes.

"The main novelty of our method is to resolve this single unassigned fraction into species-level taxonomic abundances," Bork and his colleagues wrote.

Further, they reported that by using their metagenomic operational taxonomic unit-based method, they were able to get a more accurate glimpse of the community structure of the human gut, both in health and disease.

The approach they presented in Nature Methods is based on universal, single-copy marker genes, which cluster gene sequences from both metagenomic samples and reference genomes into metagenomic operational taxonomic units, or mOTUs.

Starting with 40 marker genes that had previously been used to differentiate between prokaryotic species, Bork and his colleagues calibrated and tweaked them into a hidden Markov model-based algorithm, which they tested on nearly 3,500 prokayotic reference genomes and about 260 published human gut metagenome samples. By examining the performance of each of those 40 marker genes, the researchers narrowed in on a set of 10 that performed the best.

According to the researchers, these 10 genes had an average false-discovery rate of 1.4 percent and a mean ambiguous read alignment rate of 3.5 percent. By comparison, the researchers added, the 16S rDNA approach had a mean ambiguous read alignment rate of 41.1 percent.

Clustering, Bork and his colleagues said, enabled them to determine the fraction of species in the human gut that weren't covered by a reference genome, a figure they placed at some 58 percent.

"This implies that the majority of species in human gut microbial samples are not represented by current genomic resources, despite the substantial efforts that have gone into tar¬geted genome sequencing projects with the goal of improving phylogenomic representation," they noted, adding that unsequenced human gut species likely play key roles in the gut ecosystem.

Additionally, they found that 12 of the 30 most abundant gut species lacked a representative genome sequence. Further, by constructing a maximum likelihood reference tree based on more than 1,750 species clusters, they determined that most mOTU linkage groups, or mOTU-LGs, without species-level annotations typically belonged to the Firmicutes or Bacteroidetes.

Drawing on a dataset from 207 people — 110 Europeans sampled once, 97 people from the US, including 57 who were sampled once, 41 who were sampled twice, and two people who were sampled three times — Bork and his colleagues evaluated how well their mOTU-LG approach could profile the abundance levels of species lacking a reference genome. From this, they found that for the mOTU-LG method, 98 percent of the samples matched to a different sample from the same person, as compared to 92 percent and 86 percent for two reference genome-based approaches.

Bork and his colleagues also examined how their tool could gauge species diversity.

By comparing fecal samples from 97 asymptomatic people from the US, 85 asymptomatic people from the EU, and 25 inflammatory bowel disease patients, they found significant differences between each dataset. While the US individuals had the lowest species diversity, followed by the IBD patients and the European individuals, Bork and his colleagues pointed out that some of the differences could be due to methodological variations in how samples were collected and handled. They added that the International Human Microbiome Standards consortium is working on the issue of standard protocols.

They also studied the fecal diversity of people with ulcerative colitis as compared to healthy controls. They found that colitis patients had differentially abundant Firmicutes — possibly from the order Clostridiales — that were divergent from any reference genome. In addition, 11 species, including Bifidobacterium bifidum, Bacteroides intestinalis, and Akkermansia muciniphila, were present at different levels in patients and controls.

"This result illustrates the practical utility of our method and underscores the importance of profiling currently unknown species and the need for sequencing additional genomes to better understand the functional role of these microorganisms in the human gut ecosystem," Bork and his colleagues said.

An implementation of the method is available here.