NEW YORK (GenomeWeb) – Researchers from the University of Connecticut have shown that Oxford Nanopore's MinIon device can be used to sequence RNA transcripts, and now plan to test the device for sequencing whole transcriptomes.
Specifically, the group showed that the nanopore sequencing instrument could sequence a complex Drosophila gene in order to determine long-range information about the gene, like whether two exons that are far apart from each other are located on the same transcript. The group published its results late last month in Genome Biology after previously publishing its findings on the BioRxiv pre-print server.
Brenton Graveley, senior author of the study and professor of genomics and personalized healthcare at UConn, told GenomeWeb that the lab began using the MinIon about one year ago to see if it "would be good enough to determine the structure of transcripts."
The lab primarily uses Illumina sequencing technology, Graveley said, which is great for doing deep RNA sequencing, but "the biggest problem with transcriptome sequencing is if genes are alternatively spliced, you can't really tell which exons are present in the same transcript if they're farther apart than the size of fragment," which is typically 200 bp to 300 bp.
In the study, the researchers showed that they could sequence full-length transcripts for four Drosophila genes — DSCAM1, MRP, MHC, and RDL. DSCAM1 is one of the "most complicated genes," with the potential for 38,000 isoforms, Graveley said. "With short-read sequencing, it's really impossible to get at the issue."
The researchers sought to figure out whether the MinIon could be used to identify which isoforms were present and how many of them were there, Graveley said. The team demonstrated that was possible on the MinIon in their Genome Biology proof-of-concept study, and now plan to sequence more genes and eventually push on to sequence whole transcriptomes.
Pacific Biosciences has also positioned its RS II instrument as an ideal platform for sequencing full-length transcripts and a number of researchers have used the technology to identify novel isoforms even from well-characterized transcriptomes.
Graveley said that his lab has also used PacBio's technology for this purpose and added that while the accuracy of the PacBio is still better than the MinIon, one main disadvantage is its approximate $700,000 price tag. In addition, Graveley said, the RS II still requires a large input to sequence individual genes, which can cause PCR artifacts. To address some of those issues with the RS II, PacBio recently launched Sequel, which is smaller and half the price of its RS II.
Graveley said that the team plans to continue to focus on sequencing complex alternatively spliced genes with the MinIon. "We're going one by one through them and doing targeted sequencing to figure out transcript structures," he said. In addition, he said the lab would also start using the MinIon to sequence entire transcriptomes.
Graveley's lab focuses primarily on Drosophila and human, but for whole-transcriptome sequencing, the group plans to test the MinIon on yeast transcriptomes, which are not as large as, say, a human transcriptome, he said. The main challenge with using the MinIon for human transcriptomes, he added, is that the device currently does not have high enough throughput. Abundant transcripts, like actin and tubulin, end up getting sequenced many times, but the transcripts that are expressed at low levels become hard to identify because of the number of reads, he said.
The UConn team was involved with the model organism Encyclopedia of DNA Elements, or modENCODE, project to characterize regulatory patterns in model organisms. Graveley's team focused on characterizing transcriptional profiles of Drosophila at 30 different developmental stages using RNA-seq, cDNA sequencing, and tiling arrays.
From that work, Graveley said the researchers learned a lot about the fruit fly transcriptome, but even still, it was "tricky to disentangle isoforms." Now, he said, the researchers plan to use the MinIon to evaluate around 100 fruit fly genes from many different tissues throughout development, which is a good "way of looking at which isoforms are expressed at which time and how they change throughout time," he said.
In the Genome Biology study, Graveley's team showed that the MinIon has potential for this application by sequencing four genes, and identifying nearly 8,000 distinct isoforms expressed by those genes.
For the DSCAM1 gene, which is especially complex, the team focused on exons 4 through 10, which encompasses 93 variants within exome 4, 6, and 9, as well as over 19,000 potential isoforms.
The team looked at DSCAM1 isoforms from RNA isolated from Drosophila heads. Of the 93 potential variants, they identified 92. They verified the accuracy using RT-PCR and sequencing on the Illumina MiSeq and found that the results had 95 percent concordance. Next, they used the same approach to sequence the RDL, MRP, and MHC genes, which have the potential for four, 16, and 180 isoforms, respectively. The team obtained 301, 337, and 112 full-length reads for RDL, MRP, and MHC, respectively.
Although the team did not observe all possible isoforms, the authors wrote that some MHC isoforms, in particular, have spatial and temporal patterns of expression, thus "it is likely that other MHC isoforms that we did not observe, could be observed by sequencing other tissue samples."
Graveley said that this is one reason the researchers plan to sequence genes in different tissues at different development phases, to better capture the potential diversity of isoforms.
He added that he looks forward to further improvements in the MinIon's accuracy and throughput, which have already increased since the group started using it. For instance, he said, their device is generating a larger number of 2D reads than it did previously, which helps with accuracy, and speed has increased to 70 bases per second, from 35 bases per second, which improves the throughput.
Read lengths are also really large on the MinIon, he said. In the Genome Biology study, he said the longest read was about 8 kb, but length was limited because the researchers were sequencing PCR products. When sequencing DNA, the lab has generated reads as long as 100 kb. "Read length is limited by how big you make your fragments," he said. "And most people have a hard time making large fragment libraries because even pipetting will shear the DNA."
Aside from continuing to characterize alternatively spliced genes and move into whole-transcriptome sequencing, Graveley said he would be interested in doing direct RNA sequencing in order to detect base modifications, which he said Oxford Nanopore is working to enable.