NEW YORK (GenomeWeb) – Using only nanopore-generated reads, researchers from agbio company KeyGene have assembled the genome of the fungus Rhizoctonia solani.
KeyGene's Martin de Vos and his colleagues used Oxford Nanopore Technologies' MinIon to generate a 54 megabase genome sequence for R. solani, an agricultural pest that infects and causes disease in a number of economically important crops, including maize, rice, and soybean.
In a preprint available at BioRxiv, de Vos and his colleagues reported that their read length optimization approach enabled them to generate a highly contiguous assembly that is larger than ones previously generated using short read-based approaches. However, they noted that their nanopore-based assembly has a higher error rate than is typically desired.
Still, the researchers wrote that their "results indicate that high-quality, nearly finished eukaryotic genomes can be achieved with moderate efforts and at low cost."
The researchers used size selection to weed out small DNA fragments from the R. solani samples they isolated, leaving them with only high molecular weight DNA to maximize their read lengths. From that, they generated three long-insert nanopore libraries: two prepared from randomly sheared DNA with mean fragment lengths of 12.5 kilobases and 18.8 kilobases, respectively, and the third from intact genomic DNA.
All together, sequencing these libraries using the Oxford Nanopore MinIon produced nearly 77,800 2D pass reads, which translated to 834 megabases and an average read length of 10.7 kilobases. The majority of the long reads, the researchers noted, came from the non-sheared library.
De Vos and his colleagues assembled these reads using the canu assembler into 606 contigs spanning 54 megabases, and with an N50 contig length of 199 kilobases.
This new assembly, the researchers reported, is the most contiguous R. solani assembly yet as well as the largest published genome assembled from only nanopore reads. They noted that the N50 contig size of their nanopore assembly was 28 times larger than some previously reported short read-based R. solani assemblies.
The investigators further compared their assembly to one they generated from a single paired-end Illumina MiSeq run. The MiSeq run generated 13.9 million merged read pairs with an average fragment length of 360 nucleotides, which were assembled into 123,016 contigs with total length of 71 megabases and an N50 contig length of 1,029 nucleotides.
The researchers reported that, if they assumed the MiSeq data to be perfect, the nanopore assembly would have an error rate of one substitution error for every 2,186 bases, one insertion error for every 700 bases, and one deletion error for every 297 bases.
While this error rate is higher than desired, the researchers also said they expect that improvements in nanopore sequencing technology and chemistry may address the issue. They noted that they plan to harness high molecular weight DNA fragments and the throughput of the PromethIon sequencer to tackle repetitive plant genomes.