NEW YORK (GenomeWeb) – Researchers from the University of California, San Francisco and the Gladstone Institutes have developed a new integrated computational pipeline — the Metagenomic Intra-Species Diversity Analysis System (MIDAS) — to quantify bacterial species abundance and strain-level genomic variation, including gene content and SNPs, from shotgun metagenomes.
Several recent studies have analyzed the differences within bacterial species as a way of gauging the evolution of microbes on Earth, the researchers wrote in their paper, published today in Genome Research. Further, they added, "an understanding of strain-level variation is critical for studying the interaction of microbes with humans and for understanding microbial pathogenicity. Differences at the nucleotide level can lead to within-host adaptation of pathogens, and differences in gene content can confer drug resistance, convert a commensal bacterium into a pathogen, or lead to outbreaks of highly virulent strains."
Various research teams have used metagenomic shotgun sequencing to gain insight on strain-level heterogeneity among bacterial genomes within and between microbial communities. The method has produced genomic resolution not achievable by 16S ribosomal RNA sequencing alone, according to the team. However, metagenomics sequencing is also limited by existing computational methods and reference databases.
"Assembly-free methods that map reads to reference genomes in order to estimate the relative abundance of known strains are effective for well-characterized pathogens like E. coli that have thousands of sequenced genomes," the authors wrote. "However, such methods cannot detect strain-level variation for the vast majority of known species that currently have only a single sequenced representative. Other assembly-free approaches have been developed that use reads mapped to one or more reference genomes to identify SNPs and gene copy-number variants of microbial populations."
The researchers created MIDAS in order to address some of the problems currently inherent in metagenomics sequencing. They first generated a database of 31,007 high-quality bacterial genomes, and then used a set of 30 informative universal genes to cluster these genomes into defined groups of species.
Using a shotgun metagenome, the team found that MIDAS rapidly and automatically quantifies gene content and identifies SNPs in bacterial species, and is accurate for populations with a minimum of 1x and 10x sequencing coverage, respectively. They found microbial community structures that were missed by metagenomics analysis at a coarser taxonomic resolution. They were also able to assign 2,666 genomes (8.6 percent of the previously unannotated genomes) to a species, and reassigned species labels for 3,035 genomes (9.8 percent of the total).
They then validated MIDAS using 20 mock metagenomes that they created by pooling Illumina reads from 237 completed genome sequencing projects. Using this data, the researchers found that MIDAS accurately estimated the relative abundance of bacterial species, but slightly underestimated sequencing coverage. MIDAS also had a low false-discovery rate for SNPs, but required between 5x and 10x coverage to identify the majority of SNPs present.
The researchers then applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants, and used it to quantify the gene content of prevalent bacterial species in 198 globally distributed marine metagenomes.
By examining marker alleles unique to the mothers, the researchers found that early colonizing strains are transferred from the mother to the child, but that late colonizing strains are likely acquired from the environment. "Strain-level variants reveal patterns that contradict what one would assume from patterns at the species level," said first author Stephen Nayfach, a graduate student at UCSF, in a statement.
"The maturation of the infant gut microbiome over the first year gives the impression of ongoing transmissions from the mother," added senior author Katherine Pollard, of UCSF and the Gladstone Institutes. "But the genetic variants in the bacteria show that the acquired strains are not the same as the mother's."
As to the marine metagenomes, MIDAS analysis suggested that differences in gene content among marine bacteria were associated with geography. But further work will be needed to determine if these differences are a result of adaptation or genetic drift. "The next big challenge is to disentangle the forces that drive population structure in the microbiome and to associate this variability with traits of the host or environment," Pollard added.