NEW YORK – A new protocol for barcoding long pieces of DNA allows for inexpensive phasing and scaffolding of genomes and could replace commercial linked read sequencing solutions that are no longer available.
Haplotagging, developed by researchers at the Max Planck Friedrich Miescher Laboratory, or FML, in Tubingen, Germany, combines transposase-laden magnetic beads and split-pool combinatorial indexing to barcode the different haplotypes of a diploid genome. According to its creators, the method is fast and cheap: Sample preparation can be done in two days for less than $2, and it works with low-coverage Illumina sequencing, allowing genotyping to be done at approximately $15 per sample, including sequencing costs.
"Why give up haplotype information, when you don't have to?" said Frank Chan, a researcher at FML whose lab developed the method. "With haplotagging, you can now sequence more samples, at lower coverage, but still get better results."
His lab has already collaborated on two studies that were published last month in the Proceedings of the National Academy of Sciences and in Nature Genetics. In the PNAS paper, researchers from the University of Cambridge used haplotagging to analyze two species of South American butterflies and the formation of hybrids of those species. In the other paper, researchers from the University of Oxford used haplotagging data to perform diploid genotyping imputation.
Other researchers are already adopting the method for various purposes, including de novo genome assemblies, phasing and scaffolding genomes, and genotyping. At the Wellcome Sanger Institute's Darwin Tree of Life project, which aims to sequence all eukaryotic species in Britain and Ireland, haplotagging's emergence couldn't be timelier. Researchers had been using 10x Genomics' Linked-Reads sequencing product, which was discontinued last year, to validate or correct assemblies of long reads with Illumina sequencing.
Haplotagging "works beautifully for small genomes," said Mark Blaxter, program lead at Darwin Tree of Life. "We're now looking at creating libraries for larger genomes. It's all looking positive, but we haven't adopted it as part of our core production line. As our 10x reagents run out imminently, we're hoping to start using it."
The method "will allow for more precise selection and ensure the maximization of genetic gains within breeding programs," Andre Eggen, senior market development manager for agrigenomics at Illumina, said in an email. "For organisms for which no reference genomes exists, this new cost-effective methodology allows for population-based haplotype reconstruction, even at lower sequencing depths, which compare to classical approaches."
Haplotagging could even help phase challenging regions in the human genome, such as HLA regions and structural rearrangements, Chan said.
Haplotagging is inspired by contiguity preserving transposition, published in 2014 by researchers led by Frank Steemers, formerly of Illumina's advanced research group, and Jay Shendure of the University of Washington. Those researchers published an updated version of the technology in 2017.
This concept takes advantage of the propensity for DNA to wrap around oligo-barcoded microbeads, which when subjected to transposase, can insert those barcodes in the DNA, marking them in multiple places prior to breaking them up for short-read sequencing. The original protocols, however, offered low barcode diversity and required custom primers, Chan said, making them difficult to scale. In 2017, Marek Kucka, a research specialist in Chan's lab, set about improving the method to increase barcode diversity and to make it work with any Illumina sequencing instrument. By 2018, they had performed successful pilot studies.
Haplotagging employs 85 million barcodes with a segmental barcode design. Once tagged, it requires only PCR amplification, cleanup, and standard Illumina library prep. It requires only 1 ng of DNA input, a magnet, 96-well plates, and set of pipettes, Chan said.
"Since it's all about retaining original haplotype information, in fact the trickiest part is to get long input DNA," Chan said. "The quality of the DNA is crucial in getting high molecule size, and by extension, ensuring phasing success. From our own perspective in assembling the beads, getting the Tn5 transposase and oligo concentrations just right is key. Most users likely won't be making their own beads, so they probably won't have to worry about that, though."
In a pilot study using a protocol that has since been improved upon, Chan's team compared haplotagging with 10x's Linked-Reads product by analyzing mouse chromosome 19. Haplotagging produced more than 20,000 barcoded molecules across the chromosome, with only 25 mixed molecules, compared to nearly 18,000 molecules with 765 mixed molecules for Linked-Reads.
Haplotagging phased 99.74 percent of heterozygous SNPs, Chan said, compared to 99.8 percent for Linked-Reads. "New data will be better," he said.
One caveat is library diversity, Blaxter said. "Because read depths are lower [than with linked reads], sequence diversity wasn't as high," he said. "But for some approaches, that doesn't matter so much."
So far, Chan's group has mostly collaborated with core facilities, but it is looking to make it more widely available. He doesn't plan to spin out a company himself, but said, "We're actively looking for commercial partners and to work with them to bring this to market." He declined to say who those companies are but noted that that he is filing a patent.
Another limit to the technology is that it doesn't come with mature data analysis pipelines, as 10x's product did, Blaxter said. But Chan said his team is already developing algorithms to work with the data generated by the method.