CHICAGO – Illumina hopes that two recent acquisitions of informatics companies it made will help its customers streamline their genome interpretation workflows.
Earlier this week, the company announced that it had acquired Enancio, a French startup that makes genomic-specific data compression software. The move came four weeks after the sequencing instrumentation giant closed on another informatics acquisition, of Netherlands-based BlueBee.
Enancio, a precommercial firm headquartered in Cesson-Sévigné in the Brittany region of France, has developed lossless data compression technology that shrinks the output from Illumina sequencers from 50 GB to 10 GB, reducing the cost of data storage by a similar ratio. Illumina said the system compresses DNA sequence data by mapping reads to a reference genome, using a compact binary format to encode reads as a position and a list of differences.
BlueBee, on the other hand, makes technology to simplify access to data and help lower the cost of storing, sharing, and managing large amounts of genomic data coming off sequencers. A spinout from Delft University of Technology in the Netherlands and Imperial College London, the firm offers a cloud-based platform driven by proprietary software, offered as a service, that is designed to run on accelerated hardware such as Illumina's Dragen, which was originally developed by Edico Genome.
BlueBee, run out of Rijswijk, the Netherlands, with offices in Mechelen, Belgium, and San Mateo, California, provides a standardized platform that connects sequencers directly to its supercomputing clusters on the cloud and has a team of scientific advisors to help users customize pipelines to suit their specific needs.
"It was two acquisitions in quick succession," said Susan Tousi, Illumina's senior VP of product development. "That should show that we are more invested and engaged in helping our customers derive insights from the data that comes from sequencing."
She said these transactions give Illumina a chance to improve the way its customers aggregate, explore, and collaborate with sequencing data.
Illumina said that the volume of data generated by its sequencing systems worldwide grew by more than 50 percent in 2019, to 150 petabases. According to the company, that is as much data as 500 years of high-definition video and necessitated the inclusion of new compression technology.
Enancio Principal Software Engineer Jennifer Del Giudice — who was CEO of the company prior to the acquisition — said that Enancio started to help laboratories and researchers manage the ever-increasing load of genomic data.
"Because of the need of getting into this genomic data for health and for precision medicine, we focused our activity on reducing the size of this data because there was a bottleneck there for the users," Del Giudice said.
Her company's unique compression algorithm is what Illumina was most interested in, she added.
A NovaSeq 6000 can generate as much as 6 terabytes of data and 20 billion reads in a single run over two days in dual flow-cell mode. Enancio's product, named Lena, can reduce the size of the files by 80 percent, which also reduces storage cost and transfer time.
"For Illumina, that's an added value," Del Giudice said.
Enancio started in 2017 and is still in the precommercial phase. The company had three full-time employees at the time of the acquisition. CTO Guillaume Rizk and Del Giudice are principal software engineers for the acquired company, and Stéphane Picq, a senior software engineer, also joined Illumina. Rizk designed the algorithm that compresses data in lossless format.
For a year after its founding, Enancio was "winding around" a bit before deciding to focus on the medical field, according to Del Giudice.
"That's where the data is increasing. That's where scientists are looking for all this information with the whole human genome," she said. "That's where the need to decrease the storage is important, and that's where the need for transferring the data is important," she said, adding that transfer speed has only gained in importance as bioinformatics has shifted to cloud environments.
Enancio's operations will remain in Brittany. Illumina opened its first commercial office in continental Europe in 2017 in the Genopole biocluster hub outside Paris, and an Illumina spokesperson said that the company is committed to France.
Illumina plans on integrating the Lena technology into Dragen, a platform that uses field-programmable gate array technology (FPGA) in combination with proprietary software algorithms to reduce data footprint and enable faster speeds.
"The goal is to integrate it fully into Dragen technology. It's a priority and it will be reaching the market very fast," Del Giudice said. "[Illumina aims] to have something very fast and also very accurate."
The reach and visibility of Illumina will be invaluable in getting this compression technology into the hands of thousands of scientists worldwide, she added.
Illumina will also integrate Enancio technology into its cloud storage systems. Tousi would not commit to a timetable on these projects, but said it would be "sooner rather than later."
The sequencing firm has been busy adding new features to Dragen since Illumina acquired Edico Genome two years ago. Illumina's most recent sequencer releases, the NextSeq 1000 and NextSeq 2000, are the first to have Dragen FPGA technology built in.
"We've massively grown the number of informatics solutions based on Dragen since the acquisition," Tousi said. "But our goal is always to streamline and simplify it, go from where you're getting raw data coming out of the sequencers — base calls — to … analyzing variant calls."
Since the purchase of Edico, Illumina has been trying to concentrate on "reducing the footprint of data," according to Tousi.
"It doesn't put this major additional computational burden on you to get to a compressed file and decompress that file," Tousi said of integrated compression in general and Enancio in particular.
She said that Illumina evaluated several different compression technologies before making an offer for the mostly unknown Enancio.
"Not only was it higher-level compression than what was available commercially, but also the algorithms are written in such a way that they're very computationally efficient," Tousi said. That is making it easier to build Lena into Dragen to compress raw sequencing files, produce accurate variant calls, and streamline secondary analysis.
The Dragen integration will cut processing time further because computations will happen directly on the FPGA, according to Tousi.
She said that compression has been on Illumina's informatics roadmap for some time. The same is true about the kind of data management technology that BlueBee brings to Illumina, and the hardware acceleration that Edico added to the company.
"When we see something that was kind of on our roadmap anyway to do and we see a team that's done it already, we see this opportunity to advance our roadmap and our offering," Tousi said.
For example, she said, "we had always thought about developing FPGA technology, accelerating algorithms, building it into our sequencers. This [Edico] team had already done it. We couldn't have done it better, and it accelerated us by many years." The Enancio and BlueBee acquisitions, she added, followed a similar path.
BlueBee Founder and CEO Hans Cobben had a background in financial services and technology, another sector that has strict compliance standards. According to Tousi, Cobben and his colleagues at BlueBee took cues from professional services industries in developing a scalable cloud platform for genomic data analysis.
Rather than customizing builds for each customer, BlueBee often adds a bit more functionality to its core platform as it goes through new installations, technology that it then could take back to existing clients.
"BlueBee, we saw it similar to the way we think about our sequencing platforms, similar to the way we've built our cloud solutions, or the way Dragen was built," Tousi said. The core remains the same, but users can activate modular components as needed.
BlueBee's core product is BlueBase, which aggregates, stores, and manages multimodal data, including genomics and imaging information.
Modules include BlueFlow to manage analysis; BlueBench, a set of tools based on Jupyter notebooks for research and translational applications; as well as BlueBrain, artificial intelligence for generating insights from large datasets. All three are interoperable.
Illumina's informatics offering has long centered around its BaseSpace Sequence Hub, but the company has made an effort in the last three to four years to boost its software profile by releasing add-on modules including BaseSpace Cohort Analyzer, BaseSpace Correlation Engine, and BaseSpace Variant Interpreter. Most recently, Illumina has made several COVID-19 applications available for free to BaseSpace users.
Some of these add-on apps are being repackaged as the Illumina Analytics Platform (IAP), which the company is in the process of integrating with BlueBee and building directly onto Dragen, with an expected release later this year.
"We had on our roadmap that we would be building a comprehensive solution based on IAP when we found BlueBee," Tousi said, adding that in fact, the companies collaborated for a time before Illumina bought BlueBee.
"We saw what they had developed and what we had developed fit together like puzzle pieces," she said. "Now we feel like we've really advanced a comprehensive offering of tools and data warehousing."
In the self-developed Illumina Analytics Platform (IAP) and now with BlueBee, Illumina has what Tousi called a "a scalable cloud data exchange" meant to accelerate and simplify analytics workflows for clients.
The core BaseSpace name is remaining for the software closest to the sequencers. While there may be some rebranding of other informatics offerings as the integrations progress, Tousi said she does not want to throw away the brand equity Edico and BlueBee have built.