NEW YORK (GenomeWeb) – Researchers at Brown University have developed a strategy to increase the read lengths of Oxford Nanopore Technology's MinIon device that they plan to employ in a hybrid sequencing strategy to de novo assemble the Sciara fly genome.
The team reported in a pre-print publication on the biorxiv site on a modified sample prep process that enabled them to increase read lengths to over 100 kb.
One of the promises of nanopore sequencing is that, theoretically, it should enable very long read lengths. Thus far, however, average MinIon read lengths have hovered around 10 kb with reads longer than 100 kb rare.
The Brown group figured that one reason why read lengths have not been longer is that they get too fragmented during the sample prep process, so they tested three modified sample prep protocols intended to reduce fragmentation and increase read lengths.
For the first run, the group skipped entirely the DNA shearing step and used wide-tipped pipettes to minimize DNA breakage. In addition, they performed longer AMPure bead cleanups to help release the longer DNA strands from the beads.
The group ended up with 39 molecules longer than 50 kb and 21 molecules longer 100 kb. The DNA molecule N50 was 25 kb, and just under 80 percent of the summed length was contained in molecules longer than 10 kb.
The longest 2D read — a read in which both strands of the double-stranded molecule have been sequenced and base called together — was 103 kb. The longest 1D read was 304 kb, although the quality score for that read was only 2.12. The longest 1D read with a Q score greater than 3.5 was 202 kb. Typically, a Q-score of 9 is considered "high-quality," however the authors note that lower quality reads may still align to a reference genome.
Skipping the shearing step had some side effects, the researchers noted. First, percentage of 2D reads was low, about 22 percent. 2D reads are more desirable, since base calling both strands improves the accuracy. There was also a lower output than the 100 mb to 400 mb routinely achieved.
The authors attributed these side effects to the increased fragility of long DNA molecules that may have caused them to break after end repair, which would lead to a higher proportion of 1D molecules.
To try to find a balance between read length, output, and proportion of 2D reads, the researchers tried two other protocols. In the second run, after extracting DNA they vortexed it at full speed for 30 seconds. They then used normal pipette tips through the end repair step, and then switched to the wide-bored tips and gentle pipetting. This run resulted in about half of the reads being 2D, an output of 386.9 mb, and a molecule N50 of 13.6 kb. Proportionally, there were fewer long reads, although total reads above 50 kb increased due to the higher output.
For the third run, the team tried to increase the amount of data contained in reads longer than 10 kb. They made a few modifications to the protocol, including changing the standard AMPure bead step to deplete DNA molecules smaller than 10 kb. They also did two AMPure clean up steps before and after end repair to remove smaller molecules that may have been generated from breakage.
This run had the highest mean and median molecule sizes. The longest 1D read was just under 140 kb with a Q score of 4.28, while the longest 2D read was around 85 kb with a Q score of 8.87. The molecule N50 was 28.8 kb. Throughput was 70.1 mb of summed molecule lengths, "suggesting that there is a trade-off between read length and output/2D molecules," the authors wrote.
Lead author John Urban, a PhD candidate in senior author Susan Gerbi's lab at Brown University, declined to comment on the publication itself since the group is submitting it to a peer-reviewed journal, but said that the team plans to use this protocol in a hybrid sequencing strategy with short reads generated by Illumina sequencing technology to de novo sequence the Sciara fly genome.
Gerbi further noted that the Sciara genome has not been sequenced and assembled and the closest related reference genome is for Drosophila, which is a "distant relative.". In addition, the Sciara fly has an "enormous amount of unique biological strategies," she said.
Specifically, she said her lab is interested in studying the phenomenon of DNA re-replication. Typically, when cells replicate their genomes, they do so only one time, Gerbi said. However, the Sciara genome has a unique feature in that it contains certain loci that replicate more than once. In order to understand that mechanism, having a reference genome is necessary, she said.
Gerbi said that initially, the team tried to sequence and assemble the genomes only with Illumina data, but quickly realized that longer reads would be necessary to get a reference-grade genome. She said that they are now trying several hybrid approaches including with nanopore reads from the MinIon as well as a hybrid strategy with long reads from Pacific Biosciences' RS II machine. Gerbi said her lab is collaborating with a group at the Mt. Sinai Icahn School of Medicine to generate the PacBio data, and has also tested BioNano Genomics' DNA mapping technology.
"Our hybrid assembly will use all these different sequence technologies," she said. Last month, the team published on an initial sequencing of the Sciara genome using a variety of technologies, including the MinIon, in the Journal of the Federation of American Societies for Experimental Biology.
Since then, the investigators have been working on improving the genome and improving on their use of nanopore data.
Urban noted that sharing data and experiences with other MinIon early-access users has been particularly helpful. For instance, through discussions with other users, he said that he realized that while certain metrics related to the system's technology and software were very stable lab to lab, read length was highly variable. When he saw the variable read lengths that different labs were getting with the MinIon, he figured that the methods and techniques individuals were using to handle the DNA could be playing in role in read length. "DNA longer than 30 kb to 50 kb is very fragile," he said.