NEW YORK – Researchers from Sweden's Science for Life Laboratory have developed a new library preparation method for multiplexed next-generation sequencing of formalin-fixed paraffin-embedded samples. When sequencing thousands of samples, the method can save money by eliminating the need to generate a library for each sample.
The method uses restriction enzyme-based fragmentation and in vitro transcription to barcode and amplify genomic DNA prior to library construction. The method, dubbed CUTseq and described in a Nature Communications paper published in October, works with both fresh cells and archival samples.
"It's a new way of making libraries for NGS that allows you to pull in multiple samples and reduce costs in that way," said Nicola Crosetto, a SciLifeLab researcher at the Karolinska Institutet and a senior author of the paper. By making libraries with larger numbers of samples, "we quickly gain advantage over making them separately," he said, which could be useful in several scenarios, such as a bigger sequencing facilities, or a diagnostic setting, "where you want to screen a large volume of samples in parallel," such as multi-region tumor sampling. The method also reduces workflow time to approximately eight hours.
Using a liquid-handling robot from Dispendix to dispense reagents in nanoliter volumes, the researchers were able to lower costs to around $14,000 for 1,000 samples, "assuming we make libraries of 96 samples each," Crosetto said. By comparison, library prep for that many samples could cost at least $22,000 using the least expensive commercially available kits, the New England Biolabs Next Ultra II. The savings grow bigger the more samples prepared, he said. CUTseq could process 2,000 samples for as little as $20,200, while doing so with NEBNext kits would cost approximately $45,000.
Additional savings come from the fact that the barcoding allows for accurate quantification of how much of each sample is in the library after sequencing, rather than having to do that before the run, he said.
Typically, sequencing multiple samples in parallel has been achieved by pooling libraries that have been prepared from individual samples. This includes rapid methods that directly incorporate sequencing adapters into genomic DNA by engineered transposases, such as the indexed Nextera kits sold by Illumina.
Whole-genome amplification methods, often applied in single-cell studies, including degenerate oligonucleotide-primed PCR, multiple displacement amplification (MDA,) multiple annealing and looping-based amplification cycles (MALBAC,) single-cell MDA, and linear amplification via transposon insertion (LIANTI,) allow for direct barcoding and sample pooling into a single, multiplexed library, the authors noted, but are costly and require intact DNA — a problem for FFPE samples often found in pathology.
Crosetto said that being able to prep DNA from such samples was a "primary motivation" for developing CUTseq. "I'm interested in genomic instability in cancer, where one approach becoming more popular is to sequence multiple regions in the same tumor." These require clinical samples, often FFPE samples from pathology labs.
"We knew the challenge was to develop a method that would preamplify DNA, but make libraries from many samples in a way we could afford," he added.
The CUTseq workflow digests genomic DNA using a type-II restriction endonuclease that leaves staggered ends, which are then ligated to specialized double-stranded DNA adapters that contain a sample-specific barcode sequence, a unique molecular identifier, an Illumina sequencing adapter, and a T7 promoter sequence.
The researchers chose from a list of commercially available restriction enzymes, settling on NlaIII and HindIII, which respectively cut genomic DNA on average every 136 and 2,274 base pairs. The final sequencing library is created using the Illumina small RNA library prep kit.
In a benchmarking study, the SciLifeLab researchers compared CUTseq to the NEBNext library preparation kit and performed copy number profiling. Copy number profiles for 10 tumor samples analyzed with both kits showed high correlations, as measured with Pearson's correlation scores. For example, one melanoma sample showed Pearson's correlation of 0.993 and three breast cancer samples showed correlations above 0.98; the lowest correlation was with a colon adenocarcinoma sample, which had a correlation of 0.835. Crosetto noted that the sample with the lowest correlation had a "flat" profile, with "no detectable copy number alterations, making it intrinsically more difficult to achieve very high correlation coefficients."
"When I present the method, people get concerned by the possibility of error rates by the T7 polymerase," Crosetto said. "We've shown that the error rate is very low. We can even call SNPs reliably compared with other methods. The only disadvantage is we cover less. We're not proposing this as a substitute for whole-exome sequencing, but it may be very advantageous when you want to screen many samples from the same individuals." He added that copy number profiling can be done with as little as 125 picograms of genomic DNA.
While Crosetto sees potential value in the method, he said he hasn't applied for a patent and doesn't plan to kitify CUTseq at the moment. His preferred path would be to perform library prep as a service. "But we have not really decided what to do." He's focused more on applying the method in research.
Crosetto said his lab is collaborating with pathologists at Karolinska to use CUTseq in profiling large cohorts of tumor samples collected before and after chemotherapy. The lab is also working on adapting CUTseq for use in single-cell studies.