Skip to main content

Swedish Tagging Method Enables Local Assembly of Illumina Reads into Long Continuous Sequences


Researchers in Sweden have developed a hierarchical tagging method that can be used to obtain long sequence stretches from short Illumina reads.

The method, called Tile-seq, was published last month in Scientific Reports by a group led by Joakim Lundeberg at the Science for Life Laboratory at the KTH Royal Institute of Technology School of Biotechnology.

It is similar to Illumina's Moleculo technology in that it assembles long DNA sequences from shorter fragments, however it generates these fragments in a different way.

According to Lundeberg, the method has potential for metagenomic studies as well as for studies of regions of low complexity. These and other applications require long, accurate sequence reads, he said, prompting him and his colleagues to develop the method, which uses two types of barcodes to generate hierarchical reads.

Tile-seq starts by amplifying the DNA of interest by long-range PCR, which also introduces a unique barcode. For the Scientific Reports study, the researchers used a protocol that generates 3-kilobase amplicons, though Lundeberg said longer ones are possible.

Next, the scientists treat the barcoded 3-kb amplicons with an exonuclease, which digests them from one end. During this reaction, they take samples at several timepoints and add another label. This results in a ladder of fragments of different lengths for each amplicon, each labeled with an amplicon-specific barcode at one end and a timepoint-specific barcode at the other.

They then circularize the fragments, which joins the two barcodes, break the circles apart, enrich for the junctions, sequence them on an Illumina sequencer, and assemble the amplicon sequence from those overlapping reads.

"We put a lot of sequencing effort into these reads, but the assembly generates very highly accurate long read sequences," Lundeberg said.

In their paper, the scientists applied Tile-seq to the 48-kilobase bacteriophage lambda genome, which they sequenced with 19 PCR amplicons. In addition, they used it to sequence the TP53 gene in four different human cell lines using a single amplicon and to analyze a highly variable, low-complexity region in the canine mitochondrial genome.

One of the limitations of the method is the circularization step, which gets increasingly inefficient with longer molecules. Lundeberg said his team tried various PCR amplicon lengths and found the method worked most robustly with 3-kilobase amplicons.

In the meantime, they have been testing microdroplet PCR technology to improve the circularization efficiency of longer fragments. Having just one DNA molecule in each droplet increases the chance of the two ends meeting, Lundeberg said, adding that this approach "looks very promising."

The researchers are also working on increasing the length of the PCR amplicons. Lundeberg said amplicons of 10 kilobases are "reachable" using PCR, and the length could be increased further using technology based on phi29 polymerase, which is still at the planning stage.

The Swedish group has also automated Tile-seq in its lab, which helps in particular with taking samples at the exact same timepoints during the exonuclease reaction. In addition, they have improved the adapter sequences, Lundeberg said.

Based on their data, the method could be multiplexed to 50,000 amplicons per Illumina HiSeq lane, Lundeberg said, although the researchers have not done that yet.

At the moment, they are applying Tile-seq to metagenomic samples, where they are sequencing 3-kilobase stretches of ribosomal DNA instead of the short 16S RNA sequences analyzed in many other studies.

According to Jerrod Schwartz, a postdoctoral fellow in Jay Shendure's lab in the department of genome sciences at the University of Washington, who is familiar with this and related methods, one key benefit of Tile-seq is that each target DNA has a unique tag, "thereby enabling the detection and phasing of rare variants over large distances."

Another advantage of Tile-seq, which he said is a "clever advance" of tag-directed assembly of short reads, first described by Shendure's group in 2010, is that it can be used either in a target-specific manner or for shotgun sequencing.

"Their method could potentially reduce the amount of sequencing required to phase targeted regions in many individuals," Schwartz told In Sequence via e-mail. It could also help improve de novo genome assemblies by targeting specific structural variants or gaps for a localized assembly, he said.

Besides the inefficiency of circularization, challenges include chimeras introduced during the initial PCR reaction as well as PCR bias, he said.

According to Schwartz, both Tile-seq and Illumina's Moleculo technology rely on long-range PCR to amplify DNA. "The key difference is when and how the clonal tagging is done to reassociate the short reads together," he said. While in Tile-seq, targets are tagged during the PCR-reaction, Moleculo tags after shearing the PCR products in clone dilution pools.

"While Moleculo may be able to generate longer reads due to having no circularization step, it may come at the cost of having a reduced dynamic range for rare variant detection, having to optimize a long-range PCR over many different target sequences, and having to deal with chimeras without the aid of dual barcodes," said Schwartz.

Last year, Schwartz and his colleagues published an optical sequencing approach that obtains long-range positional information for Illumina short reads (IS 3/20/2012).

That method, he said, does not rely on PCR, so it has no issues with PCR bias or chimeras, but its efficiency is "a bit lower," which he and his colleagues are currently trying to improve.

Lundeberg's team has been developing Tile-seq in collaboration with LingVitae, a Swedish company that has been working on so-called binary sequencing, which converts DNA molecules into DNA that contains a binary code.

Preben Lexow, LingVitae's CEO, told In Sequence by e-mail that the method is protected under a divisional application from LingVitae's US Patent 6,723,513, "Sequencing method using magnifying tags." He said that his understanding is that the application covers the use of Tile-seq not only in conjunction with LingVitae's binary sequencing method but also as described in the paper.

The Scan

WHO OKs Emergency Use of Sinopharm Vaccine

The World Health Organization has granted emergency approval for Sinopharm's SARS-CoV-2 vaccine, the Guardian reports.

Scientific Integrity Panel to Meet

According to the Associated Press, a new US scientific integrity panel is to meet later this week.

Trying in the Eye

NPR reports that a study of Editas Medicine's CRISPR therapy for Leber congenital amaurosis has begun.

PLOS Papers on Cerebellum Epigenetics, Copy Number Signature Tool, Acute Lung Injury Networks

In PLOS this week: epigenetics analysis of brain tissue, bioinformatics tool to find copy number signatures in cancer, and more.