NEW YORK (GenomeWeb) – A new method that uses long oligonucleotide probes to capture and clone large numbers of kilobase-sized DNA fragments in parallel promises to enable high-throughput functional studies of proteins and genomic elements, target preparation for long-read sequencing, and other applications.
The approach, which uses long-adapter single-strand oligonucleotide (LASSO) probes, was published earlier this week in Nature Biomedical Engineering by a team of researchers at Massachusetts General Hospital and elsewhere.
For their study, they cloned more than 3,000 open reading frames — up to about 4 kilobases in size — from the Escherichia coli genome in parallel. They also used LASSO probes to clone human ORFs from cDNA, and to generate an E. coli ORF library from a complex human microbiome sample.
According to Ben Larman, an assistant professor of pathology at Johns Hopkins University and one of the corresponding authors, he and co-senior author Biju Parekkadan developed the method in order to make bacterial protein expression libraries for functional screening assays. Originally, they had planned to encode the proteins on bacteriophages, "but we realized that we could not really encode long enough protein fragments to construct a library that would be likely to yield interesting proteins," he said.
Instead, they came up with the idea to use modified molecular inversion probes (MIPs), which have been widely used to capture and amplify short DNA targets in parallel, to tackle larger targets. Traditional MIPs are short, single-stranded oligos about 150 base pairs in size, with target sequences at each end that bind to the ends of a DNA target. When the probe binds, it forms a circle and the space between the target sequences is filled in by DNA polymerase. Padlock probes are related to MIPs but do not have a gap in the target sequence that needs to be filled in.
"Seeing the ease of padlock- or molecular inversion-probe-based assays, this paper shows an interesting alternative to hybridization-based assays" for capturing larger DNA targets, said Alexander Hoischen, assistant professor of immuno-genomics at Radboud University Medical Center in the Netherlands, in an email. His lab has used MIPs to target shorter DNA regions for sequencing.
Previous studies have shown that the target DNA size for MIPs can be increased by making the adapter between the two ends of the probe longer, "but there was not a way to make such probes in high throughput," Larman explained. "That's when we did some brainstorming and methods development and came up with a way to convert, in a single-pot reaction, oligo libraries into capture probes that have very long adapters."
In essence, the researchers fuse two DNA molecules — one that contains the two targeting sequences and another that serves as a long adapter — by overlap-extension PCR. They then circularize the resulting molecule and conduct an inverted PCR reaction to yield linear DNA that has a target sequence at each end. After removing the PCR priming sites and making the molecules single-stranded, they can be used as DNA capture probes.
Initially, the researchers tested individual LASSO probes that targeted DNA of four different sizes, including a 4-kilobase target, in the M13 bacteriophage genome and were able to capture all of them.
They then made a set of more than 3,100 LASSO probes with two different adapter lengths to target most ORFs in the E. coli genome and compared the results to a set of conventional MIPs targeting the same ORFs. Overall, about 75 percent of the targeted ORFs were successfully captured by the LASSO probes, whereas the MIPs capture fewer full-length ORFs.
They also used LASSO probes to capture two full-length ORFs from human cDNA libraries, and they employed their E. coli LASSO library to capture more than a thousand E. coli ORFs from DNA extracted from a human stool sample.
In contrast to probe hybridization methods, which also target kilobase-sized DNA fragments, the LASSO strategy allows researchers to clone DNA in the correct reading frame and to insert it directly into a protein expression vector, Larman said.
The advantage over existing cDNA-based methods is that those typically only yield a small number of protein-expressing clones. "We're able to make expression libraries where a very large fraction of the clones are full-length proteins, which gives you a tremendous advantage over traditional cDNA-based protein libraries," he said.
While the largest LASSO library the researchers used in their paper had about 3,000 probes, and the largest target was about 5 kilobases, it should be possible to generate tens of thousands of LASSO probes in parallel, and to tackle even longer targets.
Parekkadan, who is a faculty member in the Department of Surgery at Massachusetts General Hospital and at Rutgers University, said that the team is currently working on a new library with more than 8,000 LASSO probes. "We will see where the ceiling is in terms of number of probes as well as the length that can be captured," he said.
The material cost of making a library with tens of thousands of LASSO probes is around $2,000 to $3,000, he said, and there are "tremendous economies of scales" for making even larger libraries. "The issue of cost was on our mind" when developing the method, he said, and the team's hope is that it will be cheap enough for academic researchers to adopt.
The researchers have filed a patent application for the technology but have not licensed it out yet. "We're exploring all options," Parekkadan said, noting that discussions with potential licensees are ongoing.
One way to improve the method is to increase the uniformity with which targets are captured, and Parekkadan said his team has some ideas for how to do that. They also plan to modify the method so it can capture not only DNA but also RNA targets.
In addition, they want to improve the design of the LASSO probes, so more targets can be captured successfully. For the E. coli library, for example, they had some design constraints, so they were not able to generate probes for all ORFs, and during the capture, not all probes worked.
Larman said that for their published study, the researchers used conservative cutoffs for the probe design, excluding, for example, ORFs shorter than 400 base pairs in order to avoid amplification bias of small targets. "Future work will involve characterizing these effects, so we know where we can be less conservative in our thresholds," he said. Also, one way to avoid amplification bias might be to subdivide the target library into fractions that each cover a certain target size range.
Larman said he plans to use the new method to identify protein targets of immune responses, while Parekkadan said he primarily plans to employ it as a platform to test and discover new drugs.
"We're very interested in using this in the space of autoimmune disease, by going into a target tissue and cloning all the expressed proteins, and then screening them against an autoimmune patient's immune system to identify what the molecular targets driving their disease are," Larman explained. "We do that now using a phage-based system that displays peptides, but peptides don't always contain the epitope information present in full-length proteins."
The method could also be useful for researchers working with model organisms for which open reading frames have not been cloned yet, he said.
Besides making protein expression libraries, a potential application of LASSO probes is the capture of large DNA targets for long-read sequencing, for example, with Pacific Biosciences' or Oxford Nanopore's platforms. "As novel long-read sequencing technologies emerge, there is an increasing need for novel target enrichment methods that allow highly multiplexed enrichment of kilobase-sized DNA," Hoischen said.
Others have explored using hybridization-based methods, for example, capture probes from Roche NimbleGen or Integrated DNA Technologies, to target DNA up to 8 kilobases in size for sequencing with PacBio's system, and "it remains to be seen how these approaches compare to the LASSO method," Hoischen said. The standard MIP workflow his lab uses is easy and allows many samples to be processed in parallel, "so I can imagine that a MIP-based assay for long reads can be attractive," he said.
"Ultimately, it will be very interesting whether LASSO will allow even longer captures," Hoischen said. Large DNA fragments have already been enriched using CRISPR/Cas9, he noted, but that method currently has low throughput at the moment and has not been highly multiplexed yet.
Another potential application, which the researchers did not explore in their paper, is to target non-protein-coding DNA from eukaryotic genomes. "It will be interesting to see whether this also works on human genomic DNA and will allow even larger targets to be enriched," Hoischen said. "There is a huge need for sequencing entire genomic regions of genes, or entire regulatory regions including enhancers and promoters, in a highly multiplexed fashion.”
Larman agreed that cloning large fragments of human DNA is an interesting application, but his team has not explored this yet. "I don't see why that would not work," he said.