Skip to main content
Premium Trial:

Request an Annual Quote

Sequencing Platform Biases Contribute to Differences in 16S rRNA Bacterial Community Profiles

Premium

NEW YORK (GenomeWeb) – Differences detected in the makeup of complex microbial mixtures may result in part from the choice of sequencing platform, according to a new study by researchers from the University of Washington.

In a side-by-side comparison of the Illumina MiSeq and Thermo Fisher Scientific's Ion PGM for 16S rRNA bacterial community profiling, the scientists noticed higher error rates and a pattern of prematurely truncated reads for the PGM, resulting in biases for certain species.

They conducted their study, which was recently published in Applied and Environmental Microbiology, in collaboration with Life Technologies, prior to its acquisition by Thermo earlier this year. While the biases they found could be addressed by sequencing in both directions and by optimizing the flow order on the PGM, the authors cautioned that scientists should conduct their own evaluations prior to choosing a sequencing platform for a particular application.

"The [two] platforms are likely to be more or less comparable if your questions are at a very high level with respect to population comparisons," said Noah Hoffman, associate director of informatics in the department of laboratory medicine at UW and one of the senior authors of the study. "If high-resolution species-level classification is one of [your] objectives, then [you] should take into account these issues related to read truncation as a potential for organism-specific errors."

Hoffman and his colleagues conducted the comparison of the MiSeq and the PGM as part of the development of a new clinical assay for identifying bacterial species from complex patient-derived samples with mixed populations.

For that assay, it was important to be able to classify bacteria at the species level, Hoffman told In Sequence, rather than just gaining a broad overview at the population level.

The assay can now be ordered from the clinical molecular microbiology lab at UW's department of laboratory medicine and runs on the MiSeq platform. Results from the comparison study were one of several factors that went into the decision to perform the assay on the MiSeq, he said, a platform that is used for other applications in the laboratory as well.

For their performance comparison, the researchers analyzed an artificial mixture containing equal amounts of DNA from 20 bacterial species and a collection of 18 human-derived complex samples on both platforms. According to Hoffman, the 20-organism sample varied in base composition, representing different types of sequence.

In both cases, they sequenced PCR-amplified variable regions 1 and 2 of the 16S ribosomal DNA, a total of approximately 350 base pairs.

The MiSeq runs, which were performed at UW, generated 250 base paired-end reads, so the PCR products were covered from both ends, with partial overlap in the center.

The PGM runs, some of which were performed at Life Technologies, yielded 400 base pair reads, and the researchers sequenced two independent libraries for each sample in order to obtain reads from both orientations.

Both platforms were able to detect all 20 species in the control sample. However, with the PGM, the researchers observed truncated reads for some species, at least for one read orientation, while "virtually all MiSeq reads" were full length. For a couple of organisms in the test sample, for example they mostly generated short reads under 100 base pairs in the forward direction with the PGM.

The results suggest "both an organism- and orientation-dependent bias" that contributes to the prematurely truncated reads on the PGM, they wrote. The researchers were unable to find an easy explanation for this bias, for example, in the base composition, suggesting that "the underlying causes of the phenomenon are complex and may involve properties outside of the nucleotide sequence itself, such as secondary structure."

In collaboration with experts from Life Technologies, the UW team was able to reduce the level of read truncation somewhat by reprogramming the instrument to use a sequencing flow order that is optimized for abnormal secondary structures, but even with that, the problem "was still present" to some extent, Hoffman said.

They also observed higher average error rates for the PGM – 1.4 to 1.5 errors per 100 bases – than for the MiSeq, with 0.9 errors per 100 bases. While error rates for specific organisms differed between and within platforms, for most organisms, Illumina had more perfect reads than Ion Torrent, they wrote.

Organism-specific differences in error rate can be of concern because lower-quality reads might be filtered out, Hoffman said, leading to an underrepresentation of particular organisms. With the PGM, such species-specific error rate differences "were not enormous, but they were detectable," he said, and are "something to be aware of."

According to Mike Lelivelt, senior director of nucleic acid bioinformatics products at Thermo Fisher Scientific, the upcoming new HiQ sequencing enzyme is bound to improve insertion and deletion error rates for the PGM, "which should improve the resolution that our long reads already provide."

In terms of replicating the known make-up of the 20-organism test sample, both platforms generated data that was "in good agreement with predicted values" for most of the species.

Two species seemed to have elevated levels according to both platforms, and three species seemed to have lower-than-expected levels according to the PGM, but only with reads from one orientation. One species appeared to be absent from the PGM data altogether when only processed reads with reverse orientation were considered in the analysis, but the species was present in reads with forward orientation.

For the 18 human-derived samples, where the bacterial composition was unknown, the two platforms delivered quite different results. For that study, the researchers combined forward and reverse reads from the Ion Torrent platform. For 13 samples, the relative abundance of at least one organism differed markedly between platforms. In nine of these cases, species were underrepresented in the PGM data or PGM reads were unable to classify them at the species level. In two cases, a species was identified by Ion Torrent sequencing but not detected by Illumina sequencing.

Overall, the study confirms that biases inherent to sequencing platforms have the potential to distort the results of microbial profiling studies. "We are in the early days of teasing out all of the complexities involved in characterizing complex mixtures of organisms, and each new technology needs to be carefully evaluated for its performance characteristics," Hoffman said.

The Scan

NFTs for Genome Sharing

Nature News writes that non-fungible tokens could be a way for people to profit from sharing genomic data.

Wastewater Warning System

Time magazine writes that cities and college campuses are monitoring sewage for SARS-CoV-2, an approach officials hope lasts beyond COVID-19.

Networks to Boost Surveillance

Scientific American writes that new organizations and networks aim to improve the ability of developing countries to conduct SARS-CoV-2 genomic surveillance.

Genome Biology Papers on Gastric Cancer Epimutations, BUTTERFLY, GUNC Tool

In Genome Biology this week: recurrent epigenetic mutations in gastric cancer, correction tool for unique molecular identifier-based assays, and more.