This story has been updated to include additional information from Michael Quail.
HOLLYWOOD, Florida – Illumina provided the first explanation for how its new Complete Long Reads (CLR) product will work as well as some customer data and testimonials at a Wednesday workshop here at the Advances in Genome Biology and Technology annual meeting.
First, DNA molecules are tagmented to generate long fragment sizes. Then Illumina introduces so-called "landmarks" at certain intervals that allow long, repetitive regions to be "reconstructed" following short-read sequencing. What the landmarks are and how they are introduced, however, was not disclosed. Some observers have previously speculated that the product is a version of sequencing analysis by mutagenesis (SAM)-based technology invented by former Australian startup Longas Technologies.
The fact that the CLRs are reconstructed, or synthesized, from short reads seemingly contradicts comments from Illumina CEO Francis deSouza last fall, who maintained that CLR is "not a synthetic long-read technology."
The workflow also includes an amplification step after landmarks are introduced, and a second tagmentation steps helps create standard short-read sequencing libraries.
Illumina Chief Technology Officer Alex Aravanis showed slides with average N50 sizes — the length of the shortest contig for which longer and equal length contigs cover at least 50 percent of the assembly — and average phase block N50 lengths from three customers with early access to CLR: GeneDx, which analyzed CYP2D6, HBA2, and CYP21A2; Macrogen, which analyzed the GBA gene; and the Wellcome Sanger Institute, which also analyzed CYP2D6.
Those customers saw average N50 sizes of 5,917 bp, 5,740 bp, and 5,712 bp, and average phase block N50s of 213 kb, 177 kb, and 165 kb.
As recently as October 2022, Illumina had suggested that CLR generated "reads" with N50 sizes of 6 kb to 7 kb and phase block N50s of about 200 kb.
"The library prep was straightforward with flexible input requirements," Wellcome Sanger's Michael Quail said, according to a slide presented by Aravanis. He later told GenomeWeb that the kit "extends what you can do with Illumina." According to Quail, Illumina instructed him to sequence the library using one lane of an S4 flow cell on a NovaSeq 6000.
Macrogen Head of NGS HyungIl Lee called the product "more convenient" than other long-read technologies, adding that low input and lack of extra equipment were bonuses.
In an exercise using the PrecisionFDA Challenge v2 dataset, Illumina CLR with Dragen interpretation on a NovaSeq 6000 achieved an F1 score — a combination of precision and recall — of 99.87 percent. That's up from 99.83 percent achieved by Illumina sequencing alone and also achieved by an unnamed competitor, according to Aravanis.
"They're trying," said Bruce Kingham, a core lab director at the University of Delaware who attended the workshop. "I think they want people to see it as a true competitor to true long reads, but I don't see that being the case," he said. "But there will be a place for it in support of their short reads. It will enhance their traditional short reads and what types of analysis they're able to do."
Peter Schweitzer, a core lab director at Cornell University, predicted that creating non-human CLR kits would require customization for every different genome. "That's definitely a downside," he said. "We do odd species, someone I know wants to do oyster genomes."
The other speaker during the workshop was Niall Lennon, CSO of the Broad Institute's clinical research sequencing platform, who presented data generated by Illumina for the Broad on the new NovaSeq X. Illumina announced on Wednesday that the Broad Institute is the first customer so far to have received the platform. Lennon compared the NovaSeq X data to data his lab generated on a NovaSeq 6000 run of the same library.
The fraction of bases above Q30 is higher for NovaSeq X, he said, and analytical performance for both SNPs and insertions/deletions showed "no meaningful difference." The per-base accuracy also did not drop off as read length increased, he said.
Broad has ordered five NovaSeq X instruments, which will allow its core lab to offer a human whole genome at 30X coverage for as low as $350 for research use and a blended genome-exome, pairing a whole exome with whole-genome sequencing at 4X coverage, for as little as $99.
Kingham noted that he is very excited to get Illumina's new XLeap-SBS chemistry on his NextSeq 2000 benchtop sequencer. Aravanis reiterated that the chemistry would be coming to that mid-throughput line in the first half of 2024. "I wish it were sooner, but getting it in a year is going to ease my anxiety about costs," Kingham said.
Schweitzer added that his lab has had early access to 2x300 bp sequencing kits for the NextSeq 2000, announced at the 2022 AGBT meeting. "The first one we ran looked great," he said.