NEW YORK – A new protein nanopore with two constrictions for DNA to pass through appears to improve the sequencing accuracy for some homopolymer regions, according to new research by scientists at the Free University of Brussels (VUB) and Oxford Nanopore Technologies.
The new dual-reader pore is a complex between the E. coli CsgG nanopore, which Oxford Nanopore licensed from the Belgian group several years ago, and a peptide from another E. coli protein, CsgF, that helps form the second constriction. Earlier this week, the researchers, led by Han Remaut, a professor of structural and molecular biology at VUB, and Sander Van der Verren, a graduate student in his lab, published the cryo-electron microscopy structure of the complex in Nature Biotechnology. They also showed that the dual-constriction pore improved the consensus accuracy for homopolymers up to eight nucleotides in length, in particular the longer ones.
However, one expert said he was not impressed by the results demonstrated in the paper, which the researchers said could be further improved.
Remaut, who is also deputy director of the VIB-VUB Center for Structural Biology, said he came to nanopore sequencing through his interest in bacterial cell surface proteins, in particular CsgG, which forms a transmembrane channel and is responsible for secreting the bacterial amyloid protein curli across the outer membrane of Gram-negative bacteria.
In 2014, his team published the X-ray structure of E. coli CsgG, "and looking at the channel, it also appeared to us that it had characteristics that could be interesting for nanopore sensing applications," he said. In particular, the channel has a well-defined constriction at its center, which modulates the electrical current signal when a molecule passes through the pore, he explained.
After the publication came out, Oxford Nanopore licensed the intellectual property his team had filed on the use of CsgG and its derivatives for sensing applications, and the two groups started collaborating. In the spring of 2016, Oxford Nanopore announced that it would move to a new pore chemistry, called R9, which it said was an engineered version of E. coli CsgG. The company made the move after it had been sued for patent infringement by Illumina, which claimed that Oxford Nanopore was using a different pore, MspA, in its products.
But the R9 pore, even though it improved the sequencing properties of the wild-type CsgG, still grappled with calling longer homopolymers accurately. "There is less information in the signal as homopolymers pass the nanopore channel, and for that reason, the accuracies go down starting from a 5-mer homopolymer," Remaut said.
His group's work on the new, dual-constriction pore resulted from an interest in the curli secretion pathway, he said, which involves two other proteins, CsgE and CsgF, that interact with the CsgG pore. CsgF, found on the extracellular side of the membrane, was known to be a coupling factor between the secretion and the assembly of curli fibers. When the researchers solved the structure of a complex of CsgG and CsgF using cryo-electron microscopy, they found that the N-terminal region of CsgF binds inside the beta barrel of CsgG and forms a second constriction approximately 3 nm above the entrance of the CsgG constriction.
Having a second constriction, they reasoned, could help resolve homopolymer base calls. "When a homopolymer passes through the [first] constriction, the bases are all the same bases, so there is very little modulation of the electrical signal," Remaut explained. "And if you have a second constriction, which is a fixed distance away from the primary constriction, you will also have a contribution from whatever base is 3 nm away. So you have two read points, if you want, rather than one dominant read point."
Next, the researchers determined that the N-terminal portion of CsgF alone, a peptide called FCP, still formed a tight and stable complex with CsgG, as well as with the R9 pore that Oxford Nanopore developed. They also showed that the complex still captured and translocated single-stranded DNA and still generated an electrical signal that has contributions from both constrictions. "And then the next step was to combine the FCP with one of the current R9 derivatives used by Oxford Nanopore for sequencing and see if the addition of a second constriction can help with base calling accuracy in homopolymers," Remaut said.
Using synthetic poly-T oligos ranging in length from three to nine nucleotides, they found that for 5-mers to 9-mers, the single-read accuracy improved with R9-FCP compared to R9. For 3-mers and 4-mers, though, the R9 pore turned out to be better. "Whereas the R9 from five nucleotides onwards starts to have difficulty, the dual-constriction pore will carry on at least until 9-mer homopolymers," Remaut said.
As a next step, they tested the two pores on genomic E. coli DNA and found that the R9-FCP data had better consensus accuracy than the R9 data for homopolymers up to eight nucleotides in length. For example, for 8-mers, the consensus accuracy dropped to 85 percent with R9 but was still 95 percent with R9-FCP. Because the E. coli genome only has very few homopolymers longer than 8-mers, the researchers could not test the performance of the complex for longer ones with good statistics, Remaut said.
Not everyone is impressed by the results, though. "I find it interesting, but I'm not all that convinced about the benefits that this modification to CsgG brings," said Jens Gundlach, a researcher at the University of Washington whose group published seminal nanopore sequencing work using the MspA pore.
One reason, he said, is that the introduction of the second constriction resulted in the pore generating a much smaller ion current signal than the R9 pore. "It's all in the current differences. One wants to have large current differences between adjacent nucleotides," he said. "This thing really kills it — it is pretty useless, I would say, as a sequencing pore."
He also pointed out that the R9-FCP pore performed worse than R9 on short synthetic poly-Ts. And though it outperformed R9 for longer ones, it still made mistakes, calling some 6-mers as 1-mers, 2-mers, or 3-mers, for example. "If there is a homopolymer in there and this thing calls it as a single nucleotide, that totally falsifies the result," he said. "If I were Oxford Nanopore, I would not invest money in this."
In general, the idea of using a dual-reader pore is flawed, he said. "You're just measuring one variable, the ion current, so you cannot distinguish between the two read heads that are put in there. You don't know which one it is that modifies the current." Instead, he said, the signal will have contributions from several bases in the two constrictions, which "kind of washes the signal out … and therefore the general sequencing ability of this pore is poor."
While the dual-reader pore might be useful as a secondary pore, taking advantage of its ability to read homopolymers of a certain length more accurately, "you will do much better with other tricks," Gundlach said. He declined to reveal more details about other ways to improve homopolymer calls, citing a grant proposal his lab just submitted.
Remaut said there is still room for improvement of the R9-FCP pore, and the data in the paper are from "just a very early prototype, but with very promising characteristics." For example, both FCP and R9 could still be engineered to improve the properties of the channel. In addition, the base calling software he and his colleagues used was trained on R9, not the dual-constriction pore. "These are still areas of further improvement until you can really test how far this dual-constriction channel can go," he said.
Another area of potential concern is the stability of the pore complex. "We see that the lifetime of the complex is above 24 hours, and some pores carry on above 48 hours," Remaut said. "That is one point where engineering attention would also have to go, to make sure that the pore would have a similar lifetime as R9, that this is not the price you pay for having the dual-constriction pore."
In the meantime, Oxford Nanopore, which doesn't disclose the precise design of its nanopores, currently offers flow cells with two different types of pores, called R9.4.1 and R10.3. The R10 pore has a dual-reader head, and combined with specific analysis methods, this pore improves the analysis of particular genomic regions, such as longer homopolymer repeats. "The paper is a very interesting analysis of this concept," a company spokesperson said in an email.