
This article has been updated with the most recent list price for the Miro Canvas digital microfluidics system.
NEW YORK – Five years ago, the National Institutes of Health's Center for Alzheimer's and Related Dementias (CARD) launched an ambitious project — analyzing thousands of brain tissue samples from patients and controls using nanopore long-read sequencing in order to fill the knowledge gaps about disease-related genomic variation.
With a project of this scale, the need for automation became apparent. But with no commercial solution in sight, the CARD researchers faced the task of building an automated long-read sequencing workflow from the ground up. It wasn't an ordinary endeavor, especially during a time when most long-read sequencing projects were only working on a handful of samples.
"When CARD and this project were planned very early on, there was not really a production-level protocol available for any long-read technology," said Miten Jain, an assistant professor of bioengineering and physics at Northeastern University and a CARD collaborator. "I don't think either company — [Oxford Nanopore Technologies or Pacific Biosciences] — had a circumstance where they had to suddenly think about how people can do a few thousand genomes a year in a dedicated facility."
Now, a few years into the project and having sequenced the genomes of 3,000 samples with long reads, the CARD team has successfully automated key steps of the workflow through trial and error, though a few bottlenecks still remain.
In the meantime, other research groups embarking on large-scale long-read sequencing projects, as well as companies supporting their efforts, have been developing sample prep automation methods, including for the extraction of high molecular weight (HMW) DNA and for library preparation, though full automation is still in the future.
'A chicken and egg problem'
Extracting and preserving long DNA molecules in an automated fashion from a variety of sample types can be a challenge to begin with.
According to Rosemary Sinclair Dokos, chief product and marketing officer at Oxford Nanopore, there is still a gap in the market for extracting nucleic acids longer than 10 kb to 15 kb. "In general, DNA extraction is always a chicken-and-egg problem," she said. "The big companies want to see a market for long reads before they invest millions of dollars in developing long-read extraction methods for high-throughput applications. But, of course, you don't have a high-throughput long-read market if you don't have an extraction method before."
Meanwhile, the CARD team developed its own automation for HMW DNA extraction. Working closely with Circulomics, a sample prep firm PacBio acquired in 2021, the researchers automated the Circulomics Nanobind DNA extraction kit on the KingFisher platform from Thermo Fisher Scientific. Looking to achieve an average N50 read length of around 30 kb, the team can now do automated DNA extraction in batches of 24 samples in a few hours, Jain said. They are still faced with the challenge of cutting a piece of brain tissue of very specific size to go into the extraction, though, which is hard to automate.
For PacBio, Circulomics' Nanobind technology has been crucial for automating HiFi sequencing workflows and supporting more scale, according to Aaron Wenger, the company's senior director of product management.
The acquisition of Circulomics was mainly motivated by PacBio's higher-throughput Revio platform, he said, which was still under development at the time. The Nanobind technology, which uses small magnetic disks with a nanostructured silica surface to isolate HMW DNA, was the "most automatable," Wenger noted.
Circulomics was folded into the PacBio sample prep team, he said, which has since released a number of kits based on the Nanobind method. Specifically, the Nanobind HT CBB kit, which supports HMW DNA extraction from cells, bacteria, and blood, was developed for automation on Thermo Fisher KingFisher and Hamilton Nimbus Presto instruments.
While the Nanobind kits are now marketed with PacBio branding, they are not exclusive to the company's customers, Wenger said, and the firm still sells kits to other researchers, including nanopore sequencing users.
One issue with affinity-based DNA extraction, including the Nanobind technology, is the risk of damaging the DNA molecules through liquid handling. "The shearing force of just pipetting it up and down three times would already break your DNA," said Alexander Hoischen, a professor at Radboud University Medical Center in the Netherlands whose team has optimized an automated HMW DNA extraction protocol using the Nanobind technology for optical genome mapping (OGM) applications.
With similar demand for long, intact DNA molecules as long-read sequencing technologies, genome mapping companies are also seeking automation solutions to boost the adoption of their technologies. Bionano Genomics, for instance, has been offering the Ionic system for automated DNA extraction that is based on so-called isotachophoresis (ITP), originally developed by Purigen Biosystems, which Bionano acquired in 2022.
ITP "concentrates the DNA and moves it through the consumable in such a way that all of the residue, the proteins, all the inhibitors, get left behind in a trailing wave" without using any pipettes, Mark Oldakowski, Bionano's chief operating officer, explained.
So far, Bionano has been optimizing the Ionic system, which has a list price of $50,000, for extracting ultra-long DNA — 250 kb to 300 kb on average — for OGM applications.
While the platform can in principle be used for long-read sequencing, adoption for that application remains unclear. "We are, at this point, not investing our resources to market the Ionic platform to sequencing customers," Oldakowski said. "But we are very open to work with long-read [sequencing] partners who would want to look at incorporating this into their workflows."
Library prep automation
Extraction of nucleic acids is only part of the battle of long-read sequencing sample prep — automating library prep comes with its own challenges.
For one, it requires gentle handling of the sample. "The degradation or fragmentation of DNA can reduce the read length and lead to biases in the data," said Michael Mouradian, VP of scientific strategy and market development at Hamilton.
Hamilton is part of the so-called PacBio Compatible program, which includes a portfolio of sample prep automation and bioinformatics platforms and methods that have been validated to work with Hi-Fi long-read sequencing.
Given that HMW DNA can be "very viscous," the speed of pipetting as well as the orifice size of the pipette tips are important factors for automating a long-read library prep workflow, said Michael Brilhante, product manager for genomics at Tecan.
A member of the PacBio Compatible program, Tecan has optimized automation of the PacBio SMRTbell Prep Kit 3.0 on its DreamPrep NGS Compact platform, which can generate up to 48 libraries simultaneously in five hours.
Revvity — another partner in the program — has automated some of PacBio's library prep kits, including the SMRTbell Prep Kit 3.0, on its Sciclone G3 NGSx workstation. "With long-read sequencing, there are a lot of concerns with shearing those samples," said Nicole Madamba, senior applications manager for liquid handling platforms at Revvity. "So we were very cautious with that when we were developing the liquid handler."
Oxford Nanopore has also been fostering automation of its library prep protocols. During the COVID pandemic, the company started to "invest very heavily in automation," Sinclair Dokos said, both in-house and with external partners.
In collaboration with Tecan, for example, the company automated its Ligation Sequencing Kit XL V14 as well as its Native Barcoding Kit 96 V14 on Tecan's DreamPrep NGS platform, which can process up to 96 samples.
In addition, the Ligation Sequencing Kit XL V14 has been automated on the Hamilton NGS Star 96 instrument, according to Oxford Nanopore's marketing material.
The company also has an in-house lab "with most of the key automation platforms" to help develop automation protocols, Sinclair Dokos said. In addition, it works with customers to help them achieve automation on their robots.
Still, researchers sometimes find themselves in need of optimizing workflows on their own, especially if they are working with DNA from unconventional samples that have not been extensively tested by the companies. For example, Jain said CARD's brain sequencing project "took about 10 months of somebody from our side working full time on optimizing the protocol to finally get it working."
In addition to preserving the integrity of the DNA molecules during liquid handling, long-read automation also needs to deal with "a lot of challenging liquids," such as viscous master mixes, said Maryia Karpiyevich, a product development scientist at SPT Labtech.
In collaboration with PacBio and the automation team at the UK's Wellcome Sanger Institute, SPT demonstrated that its Firefly platform can automate the PacBio SMRTbell library prep workflow using samples from the Darwin Tree of Life project. In addition, Karpiyevich said, SPT has supported customers automating nanopore sequencing workflows on its systems.
Meanwhile, electrowetting-based automation companies are touting their ability to handle HMW DNA extraction and long-read library prep automation by avoiding pipetting altogether. "We have capabilities on our platform to move, mix, split, and handle viscous fluids quite well," said Udayan Umapathi, founder and CEO of Volta Labs.
Since launching its first electrowetting-based sample preparation platform, called Callisto, last year, Volta has automated HMW DNA extraction as well as PacBio library prep on the platform. Using a so-called VoltaPure magnetic bead-based chemistry, the platform — which cost "just over $100,000," according to Umapathi — is designed to fit 24 samples per run. It can process whole blood samples while producing DNA molecules larger than 40 kb "consistently with high yields," according to Volta's website. In addition, Umapathi said the company is rolling out automated Oxford Nanopore library prep protocols "fairly shortly."
The Miro Canvas digital microfluidics system launched by Miroculus, which was acquired by Integra Biosciences in 2023, also enables automation of Oxford Nanopore's Ligation Sequencing Kit V14, as well as PacBio's SMRTbell whole-genome library prep, according to Integra's website. The instrument has a US list price of about $26,000 and can accommodate only one sample per run.
Volume, cost, and reliability
For Danny Miller, a physician-scientist and nanopore sequencing expert, digital microfluidics technology in general can be an attractive option, given the lower upfront capital investment as well as reagent input, which reduces operational costs.
"The challenge [with automation in general] has always been the cost, and then not knowing what the best solution is," noted Miller, an assistant professor of pediatrics as well as laboratory medicine and pathology at the University of Washington. "I can't spend $300,000 on a robot not knowing for sure that it is going to meet all my needs, and maybe not having the technical expertise to program it."
Mouradian from Hamilton agreed that automation platforms can mean "a heavy lift upfront" for customers from a capital expenditures perspective. According to him, an "entry level" Hamilton NGS automation platform can cost "low tens of thousands" of dollars, while the price tag is in the "hundreds of thousands" of dollars for high-throughput instruments.
Despite the associated cost, Miller, whose UW team is working on a clinical long-read sequencing assay as a frontline diagnostic test for many genetic disease indications, still believes automation is "essential for clinical use of long-read sequencing."
"Having automation not only simplifies the workflow but also ensures reproducibility in the clinical lab, and that is why we want to do it," he said.
Regarding digital microfluidics platforms to help automate the workflow, Miller noted that his team is having "challenges around selecting a device that can meet our throughput." Besides that, speed and reliability are also important. "On the clinical side, it needs to work well and be reliable," he said. "Making a mistake can set you back pretty far. Especially at the beginning, you have to be thoughtful and careful."
Manual first, automation second
According to some researchers, reagent kit manufacturers appear to treat automation almost as an afterthought.
Lesley Shirley, head of lab automation at the Sanger Institute, said many kits do not come with enough dead volume, for instance — extra reagents that are necessary for liquid handling platforms. "Anything you want to automate, you are going to have to have an element of dead volume, more than you would need if you are processing manually," she said. "[These are] conversations that we have had with all of these vendors over the years, and eventually they get there."
"We found that a lot of kit providers, when they develop their kits, they focus on manual [protocols] because this is the obvious thing," said SPT's Karpiyevich. "From my experience, only later, when they become involved with collaborations, they start thinking about automation."
Even when DNA extraction and library prep are fully automated, researchers still experience bottlenecks in other parts of the long-read sequencing sample prep workflow, such as with DNA shearing, where genomic DNA is deliberately fragmented into certain target sizes for optimal sequencing.
"The DNA shearing is where we have been struggling" said Ben Farr, a lab automation specialist at the Sanger Institute. His team currently uses Hologic Diagenode's Megaruptor for shearing, which is "quite manually labor-intensive to set up" and difficult to scale other than by acquiring more instruments.
After shearing, there is size selection, which can be crucial for both PacBio and nanopore sequencing but remains a holdup for automation, according to researchers. "HiFi sequencing requires size selection, and nanopore sequencing benefits from size selection, as smaller molecules are more likely to get sequenced," Miller noted.
The NIH CARD team, for instance, has been using an instrument called BluePippin from Sage Science for selecting certain sizes of brain DNA. According to Jain, the size selection step "efficiently removes reads below 10 kb without significantly impacting data yield, ensuring that the remaining long reads provide sufficient coverage for high-quality, contiguous de novo assemblies from a single flow cell." However, the problem they face is that the BluePippin instrument can only handle four samples at once, and "there is no way to scale that up," Jain said.
There are other solutions for shearing and size selection, though their actual adoption remains unclear. Tecan, for instance, has automated shearing methods on its DreamPrep NGS Compact and DreamPrep NGS platforms in partnership with PacBio, Brilhante said. The methods can generate DNA fragments of approximately 15 kb to 20 kb length and can be integrated with the downstream PacBio library preparation protocol.
In addition, Hamilton has automated DNA shearing on its NGS Star and Microlab Prep platforms as part of its PacBio workflow automation offerings.
According to Umapathi, Volta has automated size selection as part of the PacBio library prep workflow. The company also has a "proof of concept" for DNA shearing, but he declined to disclose more details given "a lot of the IPs are still in filing."
Besides BluePippin, PacBio also included the Femto Pulse system from Agilent Technologies as well as the LightBench instrument from Yourgene Health in the PacBio Compatible program for size selection.
Still, Jain believes that the BluePippin instrument "is currently the most stringent size selection that's practical for our use on the market," especially for brain DNA, where the CARD study is "aiming for the highest-quality data possible."
That leads back to the chicken-and-egg problem, and determining when customer demand is strong enough for sequencing firms, automation companies, and kit developers to invest in order to remove the final bottlenecks of sample prep automation.
"It is a bit of an effort from us as an automation partner, but we are also waiting for more demand from our customers, so it is a bit of a give and take," said SPT Labtech's Karpiyevich.
"From our side, we know that we want to develop certain library prep kits, and automation is a key portion of that, so we can initiate those efforts," PacBio's Wenger noted. "But really, the customer is probably the most important part."
That's why Jain believes the foundational work of large, government-funded research initiatives, such as CARD's Alzheimer's sequencing project, is crucial. "The automation happened in large part because NIH was willing to take that onus of the hard work that nobody would be able to do," Jain said. "It is unsung work … but it is also extremely critical in [analyzing and processing] those thousands of brain tissue samples."