Researchers from Martin Luther University in Halle, Germany, have developed a genotyping-by-sequencing method suitable for lower-throughput benchtop instruments, such as Life Technologies' Ion Torrent PGM, which they say can be applied to population studies of non-model organisms and could potentially replace conventional methods such as microsatellite genotyping.
"It's a smart idea … [and] potentially quite useful," Charles Johnson, the director of genomics and bioinformatics at Texas AgriLife Research and who was not involved in the study, told In Sequence.
Published in PloS One earlier this month, the strategy, dubbed RESTseq for restriction fragment sequencing, is similar to other genotyping-by-sequencing methods that make use of restriction enzymes to target genomic regions of interest from a large number of samples.
However, it differs in that it uses two steps of restriction enzyme cutting to reduce the number of fragments that are sequenced. In the first step, the group used a restriction enzyme that cuts frequently throughout the genome, generating many fragments. The team then ligated PGM-specific sequencing adapters to these fragments. Then, to reduce the actual amount of sequencing that would have to be done, they used a second restriction enzyme to reduce the library size so that it would be amenable to sequencing on the PGM.
"By using multiple restriction enzymes, we reduced the library quite a lot, [to analyze] a bit over 1,000 SNPs," Eckart Stolle, a researcher within the Institute of Biology at Martin Luther University and senior author of the paper, told IS. Then, "using barcoded adapters and pooled sequences, we amplified it and sequenced it."
Stolle's team first demonstrated that the method was reproducible by applying it to two honeybee samples, using TaqI and MseI for the first and second restriction enzymes, respectively. Additionally, because restriction enzymes can generate extremely short fragments, the team also included a size-selection step, selecting only fragments around 90 bases. The libraries were each sequenced on the PGM 316 chip, generating 3.67 and 2.71 million reads with average read lengths of 83 bases and 86 bases, respectively.
Following digestion with the first restriction enzyme, 99 percent of reads started with the correct triplet. The second enzyme was chosen to reduce AT content, and the researchers found that indeed, both libraries had a higher GC content of around 44 percent, compared to the overall GC content of the honeybee genome, which is around 32 percent. This "might enrich for fragments from coding regions, highly desirable when screening populations for patterns of selection," the authors wrote.
Using conservative settings, the team found that 72 percent and 77 percent of the reads mapped unambiguously to the genome and covered 11.06 and 10.05 megabases at 20-fold and 17-fold coverage, respectively.
To test the method's ability to analyze genomes without a reference, they performed a de novo assembly from the reads from the two samples, and found that the contigs generated cover 71 percent of the reference, confirming that "a de novo approach as such, is feasible to generate sufficient quality data for sound analyses, albeit yielding less consensus sequence than a reference-based approach due to the assembly computing."
Next, the team demonstrated that the method could be applied to species without a reference genome, testing it on several stingless bee genomes. For these, Stolle said that they only looked at a couple of loci, but going forward, the researchers plan to genotype more individuals and also more SNPs to "get an even spread across the genome."
Additionally, Stolle said he wants to modify the method so that it generates even fewer fragments. "We want to reliably reduce it so much that you end up with a smaller number of fragments to do small-scale genotyping, similar to what people do with microsatellites," he said.
Microsatellite genotyping relies on generating labeled primers for known loci. The technique is cost-efficient and works well, said Stolle, but in the case of organisms without labeled primers, such as many of the bee species that Stolle's group studies, using an even more scaled-down iteration of the RESTseq method is attractive because it doesn't rely on having prior knowledge of the genome, he said.
"Even if we have no information beforehand, we just make the restriction library, reduce it a lot [with additional restriction enzyme digests] and then sequence it," he said. "Eventually, we want to replace microsatellite genotyping."
Johnson said that he would potentially test the method, which he said was a "nice improvement" to other genotyping-by-sequencing methods, such as RAD-seq. AgriLife Research, which is an agricultural and life sciences research agency within the Teaxas A&M University system, currently does a lot of genotyping-by-sequencing research projects, focusing primarily on plant genomes.
Johnson added that one potential problem of the method is that it might have trouble in highly repetitive regions. Restriction enzymes will make "many cuts in these repetitive regions," which could make the method challenging to apply to plant species such as cotton and sugar cane, which have long stretches of repeats, he said.
All such genotyping-by-sequencing methods potentially have this problem, said Johnson, but one way that researchers have gotten around that is to use a methylation-sensitive restriction enzyme, to make it more likely that the enzyme would only cut regions that are being actively transcribed. Johnson added that such an enzyme could also be used with the RESTseq protocol in the second digest, which would potentially make it more amenable to species with repetitive regions.