Researchers from the J. Craig Venter Institute, using a hybrid approach that combined methods for single-cell sequencing with metagenomics, have assembled an almost-complete genome for a previously unknown bacterial phylum — TM6 — living in the biofilm on a hospital sink drain.
The group described the technique and results in a paper appearing online ahead of print this week in the Proceedings of the National Academy of Sciences, presenting the method as a tool to recover genomes of low-abundance bacteria in mixed samples, like the study's newly-sequenced TM6, a representative of a potentially vast group of uncultured phyla colloquially called the "dark matter" of life.
In the case of the group's PNAS study, the method yielded an approximately 90 percent-complete genome for TM6.
Jeffrey McLean, the study's first author, told In Sequence that the group hit on the hybrid method after looking at the probability of capturing such low-abundance species using current single-cell sequencing methods.
"It looked like it was not really economical to sort a single-cell in every well," McLean said. "The likelihood of capturing a bacterium at 0.1 percent abundance would be [about] 40 percent if you did 1,000 wells. But we saw that if we sorted 100 cells [into each well], we would have about the same probability of capture but could do it in just 10 wells."
McLean added that the team also believes that using fewer groups of 100 cells, rather than a larger number of single cells, should also reduce some of the variability that can affect multiple displacement amplification, further improving the likelihood of recovering near-complete genomes for low-abundance bacteria.
"We know that with MDA, even if you did a lab strain of E. coli you might get anywhere from 10 to 100 percent of the genome. There is a lot of variability [with MDA] and it is not well understood why that occurs," McLean said.
"But if you have five E. coli then you are very likely to get a full genome. So by that token if we had 100 cells in a well from an environmental sample, it's more likely we might have more than one [of a particular low-abundance species] and we'd likely get better assembly," he explained.
"All these factors came into account in deciding to try this approach."
In the study, McLean and his colleagues described applying their hybrid approach to biofilm samples from a sink drain in a hospital restroom. The method used flow sorting to pool random groups of 100 cells into individual wells. The DNA from these cells was then amplified using MDA, and sequenced, creating a "mini-metagenome" for each 100-cell group.
Each well was sequenced using both the Illumina GAII and the Roche 454 platforms, according to the study authors.
"There are different metagenomic approaches, but overall, it's pretty well established now that it's feasible, so we weren’t afraid of trying it," McLean said.
The group used an assembler called SPAdes and computational contig binning strategies, including the MGTAXA software, to identify contigs belonging to TM6 and to reconstruct its genome.
"We had worked with [SPAdes] before on single-cell assembly, and then as it turns out, it was very good at assembling even non-single cell genomes and it did a really good job assembling this candidate phylum out of this small pool of cells," McLean said.
"Once you have good, long contigs, the approach is to try to identify and classify them," McLean explained. "In some cases it's easy because it looks like something that has been sequenced before, but in this case it was a candidate phylum for which there's nothing in the literature yet."
McLean and his team then looked for 16S sequences from a known phylogenetic marker for TM6, finding it present in DNA from three different wells containing 100-cell MDA-amplified groups.
"We looked for this 16S gene that we knew belonged to this candidate phylum in any one of these contigs. We found one … and then a used this tool developed at JCVI called MGTAXA."
First, the group found a 273-kb contig containing a portion of the 16S sequence known to mark the phylum. Using this as a starting point, the researchers were able to collect other contigs that also belonged to TM6, for a total of 1.07 Mb in seven contigs, which they estimated to represent at least 90 percent of the TM6 genome.
"We were able to tell [MGTAXA] what the contig of interest was," McLean explained. "It looks at the frequency of k-mers, the nucleotide patterns, and it makes a model, so we trained the program and said, 'find other contigs that look like this first one,' and it was able to pull out the other seven contigs that made the majority of the [TM6] genome."
According to McLean, the group's estimate that they were able to assemble 90 percent of the TM6 genome was a conservative one. The assembly could actually "very well be complete," he said.
Comparing TM6 genome drafts from the three different wells that were sequenced, the researchers also found high nucleotide identity, reporting that this provided "strong confirmation" that the team had assembled a correct genomic sequence.
From the near-complete TM6 genome, the team was then able to infer some potential information about the organism, namely that it is likely gram-negative, and possibly a symbiotic bacteria living alongside an unknown host.
McLean said that the JCVI group believes that the mini-metagenomic approach could be a useful bridge between single-cell approaches and fully metagenomic analyses, and it is planning to apply the method to more hospital samples and potentially other areas of microbiome research.