NEW YORK (GenomeWeb) — Researchers at the University of Washington have repurposed a next-gen sequencing protocol to pick apart and assemble individual microbe genomes from a metagenomic sample.
As reported in G3: Genes Genomes Genetics in May, the team, led by Jay Shendure and Maitreya Dunham, associate professors of genome sciences at the University of Washington, showed that a method known as Hi-C could generate probability maps of a synthetic metagenome sample to enable the assembly of the individual genomes in the sample. Additionally, they demonstrated that the method could also be applied to samples containing a mixture of both microbial and eukaryotic genomes.
A group led by Job Dekker and Eric Lander from the University of Massachusetts Medical School and the Broad Institute developed Hi-C in 2009. The method uses proximity ligation and next-generation sequencing to construct 1-mb resolution spatial maps of whole genomes in order to study their three-dimensional architecture.
It has since been adapted by other groups to generate chromosome-scale haplotype maps, to do de novo assembly of mammalian genomes, and to study chromosome structure of single cells.
Shendure, who also led the group that used Hi-C sequence data to assemble mammalian genomes, said that his group's original study demonstrated that the method was useful for single genomes, so he wanted to next test it on samples containing multiple genomes.
"Metagenome assembly is challenging with conventional methods," Shendure said. "Many genomes are present at different abundances, and you don't always know how many species are present." Currently, most researchers do metagenome assembly by first doing shotgun sequencing and then applying algorithms especially designed for metagenome assembly. However, he said, most of those methods result in short contigs without long-range information to piece those contigs together. In addition, the authors wrote in the G3 study that library construction methods remove the long-range contiguity information, making assembly of any one genome difficult.
But because the Hi-C method "provides a signal of contiguity that is completely intracellular and contains both intra- and interchromosomal information," it can be used to reconstruct the individual microbial genomes, the authors wrote.
The method is "a very elegant way to try and figure out what species are present in a given mixture," according to Siddarth Selvaraj, a graduate student at the University of California, San Diego who was not involved with the study, but previously published a study that used Hi-C to data to generate haplotype maps.
Selvaraj added that the G3 publication, along with another recent publication that used Hi-C to deconvolute metagenomic samples and the previous publications that used Hi-C for haplotyping and assembly, demonstrate that Hi-C is a robust and versatile method that has many other applications aside from its original purpose of studying 3D structure.
The main addition the UW team made to the Hi-C protocol was that to develop a new algorithm called MetaPhase, which essentially clusters content according to which organism it belongs, Shendure explained. Then, to assemble each of the specific genomes, the researchers apply the algorithm Lachesis, which they described in their previous paper.
The UW group first tested the method on a synthetic metagenomic sample consisting of 16 yeast strains that included four strains of Saccharomyces cerevisiae and 12 other species of Ascomycetes, all of which have publicly available reference genomes. They compared the Hi-C approach to standard methods for metagenomic sequencing and assembly.
For the metagenome assembly, the researchers did shotgun sequencing and mate-pair sequencing. The assembly had 48,511 contigs with an N50 contig length of 17.3 kb.
Next, they aligned the Hi-C reads to the metagenome assembly and used the Hi-C structural information to further determine which sequences were from the same cell to cluster the contigs.
According to the algorithm MetaPhase, there were 12 clusters, which matched closely with the 12 species present in the draft assembly. Additionally, the vast majority of the sequence, 82 percent, fit into one of the 12 clusters.
Further examining the clusters illustrated that some species had "greater Hi-C link densities than others" and closely related species were more difficult to separate than distantly related ones. For instance, the four strains of S. cerevisiae were assigned to one cluster, as expected, but further elucidating the individual strains would have required additional algorithmic development, the authors wrote.
The team next tried to scaffold the genomic content of each yeast species from the clusters of contigs by running them through the Lachesis algorithm. Lachesis generated chromosome-scale scaffolds for each of the eight S. stipitis chromosomes. The scaffolds matched the reference genome assembly with a few clustering errors. One chromosomal cluster had telomeric sequences from four other chromosomes.
After demonstrating that the method could work on mixed samples of yeast, the researchers wanted to see whether the Hi-C method and MetaPhase algorithm could deconvolute a metagenomic sample with both eukaryotic and prokaryotic species.
The team created a sample containing eight yeasts, nine bacteria, and one archaeon. Hi-C sequencing and MetaPhase predicted 18 clusters, grouping 89 percent of the contigs into the 18 clusters, which matched the 18 species in the sample. Over 99 percent of the contigs clustered correctly, and those corresponding to archaeal and bacterial species had an accuracy rate of 99.87 percent..
Finally, the researchers used Hi-C to scaffold the genomic content of the prokaryotic species from clustered contigs. The prokaryotic genomes had a weaker signal in the Hi-C data than the eukaryotic genomes, making it more difficult to orient the genomic content within chromosomes. Nonetheless, the researchers were still able to separate out the chromosome and plasmid-derived sequence within an individual prokaryotic genome.
"Capturing the three-dimensional genomic interactions within a cell is a new and different approach to obtain full genomes from the cells through metagenomics, and it will be a valuable tool in addition to other methods," Per Nielsen, head of the Center for Microbial Communities at Aalborg University in Denmark, told IS. "For natural complex microbial communities, however, there may be problems if the strain micro-diversity is large, but that is a well-known obstacle for all present methods based on metagenomes."
Shendure added that the group's next step will be to test the method on actual environmental samples rather than synthetic metagenomes. He said that he plans to start with a sample with "low to modest complexity with a finite number of species" and will then move on to metagenomes that are more complex.