NEW YORK (GenomeWeb) – A team led by researchers at the US Department of Energy's Joint Genome Institute has successfully created high-resolution phylogenetic profiles for microbial communities.
To achieve this, they performed single-molecule real-time sequencing using the Pacific Biosciences RSII platform to generate full-length 16S rRNA sequences, which the researchers have dubbed PhyloTags.
Since PCR enabled a closer look at genes in 1983, small subunit (16S) ribosomal RNA (rRNA) genes have become the most widely used marker for performing phylogenetic analyses. By examining 16S rRNA genes, researchers can classify novel bacterial and archaeal taxa.
Traditionally, researchers used Sanger sequencing to get accurate 16S rRNA gene sequence long-reads. However, even though technology has advanced the process remains costly and has a low throughput that isn't useful for many applications.
Next-generation sequencing platforms, such as those from Roche 454 and Illumina, offered high-throughput technology that was roughly one tenth of the cost of Sanger sequencing, which did help to advance research. But these platforms generate short reads that cannot create accurate or correct phylogenetic profiles for microbial communities from environmental DNA samples.
The JGI researchers wanted to ascertain how effective it would be to use PacBio's RSII platform to create full-length 16S rRNA gene sequences, a study described in a paper published last week in the ISME Journal. The researchers started by creating a mock community by pooling 23 bacterial and three archaeal species, with known genomes, at varying ratios. Then they performed shotgun sequencing using one SMRT cell on the PacBio RSII platform.
"The first [part of the study] was just a benchmark to try to make sure that what we put in is actually what we then get out at the other end," said Tanja Woyke, lead author and researcher at JGI. Woyke and her team first had to make sure the workflow was error-corrected.
"We know that [with] PacBio there still is a given error rate, but if you can use a constant sequence you can essentially error-correct your data," Woyke told GenomeWeb.
Brett Bowman, a software engineer at PacBio, developed a workflow algorithm, accessible on GitHub, that takes advantage of how SMRT bells on the RSII platform make the samples topologically circular and allows the platform to sequence the same molecule multiple times, PacBio CSO Jonas Korlach told GenomeWeb. The multiple sub-reads help to build consensus within the sequence and results in high-quality reads, he said.
Then the researchers wanted to see what they could do with actual environmental samples, obtained from Sakinaw Lake near the Sunshine Coast of British Columbia, Canada. "We have been studying the lake quite extensively and we know that is has a larger amount of Canada phyla," Woyke said. These phyla are not as extensively cultivated around the world, so it would be a pretty good test of how the PacBio technology would work with lesser-known samples, she explained.
Woyke and her colleagues extracted the DNA from the Sakinaw Lake samples using previously described techniques after they had been filtered onto Sterivex filters. The extracted DNA was amplified and put into DNA libraries using PacBio prep kits. The researchers then used universal bacterial primers to generate 16S rRNA gene sequence amplicons.
The researchers decided to do a true comparison of technologies with the Sakinaw Lake samples and compared the PhyloTags produced from the PacBio system and Illumina V4 rRNA gene sequences (iTags).
They filtered and manipulated the PhyloTag reads using the JGI SMRT portal and a set of tools from MOTHUR, an open source informatics resource for microbial ecology. The researchers analyzed the iTag reads using JGI's iTag analysis pipeline and aligned them using the SILVA database. The research team then filtered and manipulated the sequences using a variety of tools available in the BBMap package for platform-independent community comparisons.
They found that while community structures on the phylum level were comparable between both PhyloTag and iTags, PhyloTags produced fewer instances of ambiguous classification. Although the error-correction process associated with producing PhyloTags does create some problems, it produced long-read sequences that allowed greater phylogenetic resolution across multiple taxonomic levels.
"From an overall life perspective, it's really important to have full-length 16S sequences" in order to connect all the different microbes to their part on the tree of life and put them into evolutionary context, Woyke said. "We use 16S rRNA genes quite extensively [in this field]," she added. These genes make up many of the databases that researchers use to create phylogenetic trees from scratch. "It is what we would refer to as the gold standard," she said.
"16S [rRNA gene sequencing] is used fairly widely in both research of microbial communities and industrial biotechnological fields," added PacBio's Kolach. "We are certainly delighted to see [PacBio's technology] used and highlighted [to show] a comprehensive value to these metagenomics communities." He also noted that aside from the longer read length one of the greatest advantages of the PacBio platform was the lack of GC content bias, which is often a deficiency in shorter-read technologies.
While satisfied with the results, Woyke does acknowledge that current techniques come with inherent problems. "So we are still using primers," Woyke said. "It is a kind of bias...these primers are designed based on existing ... 16S sequences ... and sometimes people try to optimize them and redesign them. But we know that they don't amplify everything. There are certain branches in the [evolutionary] tree that these primers do not capture."
Eventually, Woyke hopes that researchers will one day completely eliminate PCR from the process and remove the primer bias. "But I think right now...it's still much cheaper to do PCR and try to get 16S PCR and then sequence that from the environment," she said.