NEW YORK (GenomeWeb) – A group of researchers from the University of Queensland and Imperial College London has developed a hybrid assembly method that combines short-read sequence data with nanopore sequence data from Oxford Nanopore Technologies' MinIon.
The researchers developed the method in order to study antibiotic resistance in bacterial genomes that they previously sequenced with Illumina technology, but for which they had not been able to construct a complete assembly. In addition to being a hybrid approach, they built in the ability to analyze the nanopore assembly in real time so that assembly could be stopped when the genome was complete.
The new method was described in a paper published to the pre-print server bioRxiv this week.
Lachlan Coin, group leader of the genomics of development and disease division at the University of Queensland and senior author of the study,, told GenomeWeb that in the future the group wants to modify the assembly tool to work only with nanopore sequence data, and then test the MinIon's ability to sequence and assemble bacterial genomes in real time in the field, such as at a hospital.
Coin's lab has been working with the MinIon for just under two years and is interested in using it to understand how bacteria acquire antibiotic resistance. "We want to be able to identify the strain, species, [and] antibiotic resistance profile in real time from a biological sample," he said.
Prior to securing a MinIon, his lab primarily relied on using Illumina technology to sequence bacterial genomes and thus had a built up a collection of samples that had been sequenced and partially assembled. The group wanted a tool that would enable them to use the MinIon to create better assemblies — joining contigs and filling gaps.
The team's hybrid assembler, npScarf, builds off a previous paper also published to the bioRxiv pre-print server that describes a method for analyzing MinIon sequence data in real time. In that publication, researchers built in streaming programs to their sequence analysis pipeline that process data in a step-wise fashion as it is generated, rather than at the end of a sequencing run.
In the more recent study, the team again relied on the streaming program to perform assembly in real time. First, the assembler analyzes the short-read contigs, determining the average depth of coverage for the 20 longest contigs that are longer than 20 kilobases, since those are most likely to be unique contigs, rather than contigs composed of a bunch of repetitive sequence.
Then, it uses that information, as well as the long nanopore reads, to determine a genome backbone, pairing two contigs with one longer nanopore read to establish the relative position of the contigs.
The npScarf assembler also incorporates streaming processing to align the long reads as they arrive. When the nanopore reads connect unique contigs, a bridge is generated. As data continues to stream in, when enough data continues to support the existence of that bridge at a predefined threshold, the gap is considered to be filled. Next, npScarf goes back through the contigs that were not used to form the backbone to look for contigs that contain information about repetitive regions. Those are also identified with help from the long read data, and can be used to extend the scaffolds.
"You do the scaffolding while the data is coming in and can look at the assembly and know when you can stop," Minh Duc Cao, lead author of the study, told GenomeWeb.
As the assembly is happening, the algorithm updates statistics like the N50 and indicates when the genome is circularized, Coin said, allowing researchers to monitor the assembly. It also will assign genes to plasmids and identify areas of pathogenicity islands — two features especially important for understanding antibiotic resistance, he said.
The researchers tested the performance of npScarf on two Klebsiella pneumonia strains. They first sequenced the genomes on an Illumina MiSeq to 250-fold coverage and assembled them with the SPAdes assembler, generating assemblies of 90 and 69 contigs each, with an N50 of 288 kb and 302 kb.
Next, they sequenced each genome on the MinIon. For the first strain, they generated 185 Mb of sequence data, or 33-fold coverage, 27 Mb of which were 2D reads. For the second strain, they had to do two sequencing runs on the MinIon and the combined sequencing data yielded 100 Mb, or 18-fold coverage, 22.5 Mb of which were 2D reads.
During analysis, the pipeline continuously updated the assembly statistics, including the number of contigs, the N50, and the number of circular sequences.
For the first strain, assembly was complete after using less than 120 Mb of nanopore sequence data, reducing the number of contigs to four from 90, one of which was circularized. To validate the assembly, the researchers aligned it to the reference genome, which contained 16 contigs in five scaffolds.
On doing so, the researchers found that two contigs in the reference genome were oriented incorrectly and also identified a plasmid sequence in their hybrid assembly that was in a gap in the reference genome.
They also found that one of the reference scaffolds, which seemed to represent the bacterial chromosome, was fragmented into two contigs in the hybrid assembly. On further examination, the researchers identified that the two contigs were separated by an rRNA operon that was not present in any of the nanopore reads. They anticipated that fragmentation could be resolved with additional nanopore sequence data.
Next, they looked at npScarf's ability to identify genes within genomic islands and plasmids by comparing it with an annotation of the Illumina-only assembly.
They found that npScarf identified the six genomic islands the prior annotation found, as well as four additional islands, one of which contained three antibiotic resistance genes. The genomic island was likely not identified by the previous annotation because of the presence of repetitive sequences, which "caused the island to be fragmented into 10 contigs in the Illumina assembly," the authors wrote.
Developing a hybrid assembler for the MinIon is "the first step toward a completely de novo assembler," Coin said. And, he added, it will be a good tool for the lab to use to improve on assemblies it has already done that are based off of Illumina data. "There's complementarity in these technologies," Coin said of the Illumina and Oxford Nanopore sequencers.
The researchers now plan to use npScarf to go through their collection of Illumina-based bacterial assemblies to better understand the transfer of antibiotic resistance, Coin said.
In addition, Cao said they are looking at tweaking npScarf to be able to assemble metagenomes in real time.
Ultimately, the goal is to perform the sequencing and assembly in real time on clinical samples, Coin said, "to rapidly identify when a patient has an antibiotic resistance infection, with the aim of giving them the right drug."