CHICAGO – Software company MemVerge believes that life sciences is a great starting point for what the startup calls "big memory computing," and genomic data processing is an ideal use case.
"Our vision is that memory will become bigger and more feature-rich so that it can satisfy the data-centric applications without using storage, thereby eliminating storage files and improving the overall performance and time to results," cofounder and CEO Charles Fan said at the Bio-IT World conference in Boston last week.
As Bio-IT World began, Milpitas, California-based MemVerge announced that the Huck Institutes of the Life Sciences at Pennsylvania State University had signed up to use the firm's Memory Machine virtualization software to accelerate genomic data processing for plant DNA research.
Memory Machine virtualizes various types of memory hardware so users can process massive datasets like genomic sequences without having to turn to speed-sapping storage platforms like hard drives and cloud hosts. Essentially, the data resides in computer memory at all times so there is no time lag when saving and retrieving large files.
In its marketing material, MemVerge said that traditional memory is too small and storage is too slow to handle datasets in fields like genomics. Memory Machine runs on Linux servers to deliver what Fan called "bigger memory at lower cost and higher performance." It also includes a suite of data services.
In bioinformatics — one of its three focus markets, along with financial services and cloud computing — the company is concentrating on secondary and tertiary analysis of big data.
"The root cause that we are trying to target is when data is bigger than memory, it doesn't fit into memory. It goes into storage. That's when things become slower," Fan explained. "As a result, you will experience slower time to results, higher infrastructure costs, and heightened risk of data loss."
Penn State Huck Institutes researchers are, among other things, analyzing plant DNA sequences in search of hardier crops to improve food supplies in developing regions of the world. MemVerge said that some of the analytics can take weeks to complete, and processes occasionally fail when large datasets overwhelm available computing memory.
Memory Machine, coupled with dynamic random-access memory and persistent memory, is addressing that problem.
"The innovative technology helps us achieve DRAM-like performance from a mixed memory configuration. This has not only dramatically increased our time to research findings but has also saved us considerable budget in the process," Claude dePamphilis, director of the Huck Institutes' Center for Parasitic and Carnivorous Plants, said in a statement.
Penn State has also adopted MemVerge's ZeroIO Snapshot, which parallelizes computation to accelerate performance and protect against failures.
ZeroIO Snapshot effectively takes regular photographs of data stored within memory, perhaps every 15 or 20 minutes. "When there's a failure, let's say if the system reboots or if you lose [access to] the machine … you can restart that process from that point and continue running," Fan said.
This is similar to Windows Recovery available on any PC running Microsoft Windows, but suitable for big data. A typical autosave option does not work well with genomic-size datasets because it takes so long to save files that large, according to Fan.
"If every 20 minutes, you pause for 10 minutes [to save], that's a little bit expensive to do this," he said. "We can do this within a second to capture everything, and then if you need to move it to storage, it will just happen in the background. You don't have to stop your application."
In a video Fan showed to GenomeWeb from a recent MemVerge webinar, researchers from the Translational Genomics Research Institute, or TGen, reported a 36 percent reduction in computational time for de novo genome assembly after adopting Memory Machine.
Also last week, the company said that it has teamed with single-cell genomics startup Analytical Biosciences to create what MemVerge called the first implementation anywhere of big memory computing for genomics research.
Beijing-based ABio has created a database of integrated and curated single-cell genomics data and developed core analytical and visualization tools.
ABio has applied its technology to cancer and COVID-19. According to MemVerge, ABio researchers found that when analyzing large single-cell datasets, the firm was spending as much as 58 percent of its computing time on moving data to and from storage.
The Chinese startup shifted to a "big memory computing environment" featuring DRAM, Intel's Optane persistent memory, and MemVerge Memory Machine software. This, according to ABio and MemVerge, eliminated 97 percent of transfers to and from storage, allowed data to load 800 time faster, and cut pipeline processing time by 61 percent, though those results have not yet been presented in peer-reviewed research.
"The big memory platform that MemVerge and Intel developed will lead to more efficient ways to gain greater insights and knowledge in disease mechanisms and improve healthcare," ABio founder Zemin Zhang said in a statement.
MemVerge was founded in 2017 by several veterans of the computer storage industry, though they created a software rather than a hardware company. The firm, which is backed by investors and partners in the tech industry including Cisco Systems, Intel, NetApp, and SK Hynix, introduced its product last year.
Fan said that MemVerge technology can accelerate workflow processes both in the cloud and for local computing installations.
"We think the hardware is going to advance … front where it becomes bigger, cheaper, and software is going to be built on top of it to deliver the necessary functionalities," Fan said. "Therefore more and more applications will be run out of memory."
"We came together thinking that the fundamental dichotomy between memory and storage is going to slow applications down and there must be an architecture that can converge the two into one," said Fan, who explained that MemVerge stands for "memory convergence."
MemVerge wants to help any user that deals with both big and fast data. Big means a large capacity. Fast can refer to the speed of generation or of processing.
"When you have the combination of big and fast, that is where we can potentially be a solution," Fan said. "We think any application that needs bigger memory or faster storage IO is a potential target application that we can help."
Clinical genomics fits that description, especially when someone needs speedy analysis of a genome to help with a diagnosis or treatment plan.
In the future, perhaps over the next five years, Fan is betting on the continued growth of cloud computing. He envisions technology like Memory Machine disaggregating between computing resources like central processing units, graphical processing units, memory, storage, and data networks.
"They each become a pool they can dynamically provision, which is not the case today for memory," Fan said. "I think some fundamental, key technology is needed both in hardware and software, and we are the software part of it."
Much of the talk at Bio-IT World was about managing data in hybrid cloud environments. Fan said that MemVerge could help there by snapshotting applications as users move data from in-house servers to the cloud and between clouds.
"We can enable software-composable memory and this application mobility and deployability across the hybrid cloud," Fan said.