CHICAGO – Distributed computing software company WANdisco has effectively entered the genomics market. Last month, the company said it has donated access to its technology platform to the Korean Bioinformation Center, or KOBIC, the bioinformatics arm of the Korea Research Institute of Bioscience & Biotechnology.
Founded in Sheffield, UK, in 2005 and headquartered there as well as in San Ramon, California, WANdisco helps companies and organizations migrate very large datasets from local installations to the cloud and then manage the flow of data following the migration.
"The end game for us is to be the broker of all data that happens between clients," said David Richards, chairman and CEO of WANdisco.
WANdisco specializes in what it calls "live data," meaning that no matter how employees and partners are accessing an organization's data, the information is always up to date and reflects a single "version of the truth," according to Van Diamandakis, the company's senior VP of marketing. The firm has also referred to its operations as "active data replication."
KOBIC, maker of Bio-Express, a cloud-based service platform for genomic and proteomic research, launched a COVID-19 research portal in March 2020.
KOBIC had built its big-data analysis capabilities of Bio-Express on the Apache Hadoop Distributed File System and its genomic analysis applications on the Linux-based Lustre framework, and had issues transferring information between the clouds. As the volume of COVID-19 data and Bio-Express users exploded last year, more than 40 percent of the platform's processing time had to be dedicated to replicate some 40 TB of data each day, according to WANdisco.
In a statement, KOBIC researcher Kun-Hwan Ko said that WANdisco has helped the organization accelerate bidirectional file transfer between Hadoop and Lustre by 13 times, while cutting the average time for Bio-Express genomic analysis by 30 percent. KOBIC declined to answer additional questions about its use of WANdisco.
The donation to KOBIC is WANdisco's way of offering its services pro bono to contribute in some small way toward the global effort to fight the pandemic, according to Diamandakis. "We wanted to offer it for free and to help speed up the research for COVID," he said.
To that end, WANdisco announced earlier this year that it would offer free access to its platform for COVID-19 research purposes. KOBIC is the first to take the company up on the offer.
Diamandakis said that WANdisco's first notion had been to join the fight against COVID-19 rather than to make a full-fledged entry into the biotech industry.
"We knew our software could dramatically speed up transferring data from on-premise into the cloud and replicating that data so that researchers could work on it real time as it happens," Diamandakis said.
WANdisco was founded in 2005 after a chance meeting between Richards and cofounder Yeturu Aahlad, who has a Ph.D. in distributed computing.
Richards explained that the WANdisco name stands for wide-area network distributed computing, and the company organized its own software development in a distributed manner. "If you have engineers in India, China, the United Kingdom, and the United States, how do you get them to collaborate in different data centers on the same piece of data at the same time?" Richards said.
WANdisco holds about three dozen patents in the area of distributed computing, and has patented methodology that keeps track of changes while the migration is in process, so the move is more seamless.
"For people like KOBIC, it was about how [to] migrate this data that's changing all the time from on-premises into their purpose-built cloud for genome research and make that data available to everybody to do the research," Richards said.
He explained that the company's business model is to license its IP to cloud vendors, so it does not need to build a large sales organization of its own.
Microsoft, for example, has chosen WANdisco's offering as a preferred product for data migration to the Azure cloud, and Amazon has done something similar for the Amazon Web Services cloud.
"It's why [cloud data management firms] Databricks and Snowflake partner with us to move machine learning data into their cloud applications," Richards added. "So we're playing a very important role in businesses' ability to adopt cloud computing."
Diamandakis said that WANdisco is becoming a "native service" in the Azure user portal. "Microsoft customers can just go in there and buy the service and migrate through data and the bill shows up on the Microsoft bill," he explained. He added that WANdisco has similar agreements with Amazon Web Services and the Google Cloud Platform, though the Azure integration is farthest along.
WANdisco announced the partnership with Microsoft last June and has since moved from private preview to open beta with the Azure integration. A general release is expected within a couple of months.
According to Richards, WANdisco has never taken any venture capital or external private equity. The company went public on the London Stock Exchange in 2012, and he said that the capitalization table for the initial public offering consisted entirely of the founders.
WANdisco started out managing source code for software development, but in 2014 and 2015, it moved to migration of large datasets from on-premises installations to cloud platforms.
"You can't really use on-premises Hadoop, which is batch-based, for real-time analytics and machine learning in the modern era of commerce," Richards said. "We realized that that was going to change and we had to pivot the business as a public company to then focus on the movement of data to cloud."
The firm originally did build products for Hadoop, though Richards said that the effort was not in vain after the shift in focus. "We had great expertise then in how you build products necessary to move humongous datasets on premises to [the] cloud," he said.
Historically, the company's customer base has been in financial services, telecommunications, government, and on the business side of healthcare. Highmark Health subsidiary HM Health Solutions, which provides technology and business services to payers, is a client, as are coffee giant Starbucks and internet domain registrar GoDaddy.
Health data presents regulatory challenges not found in other industries. WANdisco said that to comply with HIPAA, US healthcare customers required the company to modify contracts to ensure that no patient-identifiable data is visible to the vendor and that no data logs are sent offshore without permission.
South Korea updated its Personal Information Protection Act at the beginning of 2020, and WANdisco installed its software on KOBIC servers in a way so it does not see any patient data controlled by the Korean research organization. Similar rules apply in Europe under the General Data Protection Regulation, according to the firm.
In life sciences, Richards sees considerable potential from the biotech sector. "Almost all of [the big biotech companies] built large-scale clusters for data analysis on [their] premises," Richards said. For this reason, WANdisco is trying to set up a sales pipeline in life sciences now, particularly among biopharmaceutical companies looking to move massive datasets from their premises to the cloud to support real-time analytics and machine learning. "Those applications really can only exist in the cloud," Richards said.
The limiting factor has been computing power, not disk storage. To handle some of the analytics needs for some retail companies, for example, requires thousands of CPUs at peak times. "Michael Dell famously proved that storage is actually more expensive in the cloud. You still get economies of scale, but people are going to the cloud because of compute," Richards said.
"If I'm going to run a machine learning algorithm that needs 3,000 CPUs for 20 minutes, and then I'm going to shut the thing down and I'm not going to use them for another month, [that is] not a use case for on-premises," Richards said. This dynamic presents a $300 billion annual market opportunity just to move data to the cloud for advanced, real-time analytics, he added.
Genomics research falls right into that wheelhouse because of the size of its datasets and the intermittent nature of computing needs, according to Richards.
WANdisco is not daunted by file sizes, even though an annotated whole-genome sequence can take up a full terabyte of space. "A 1 TB dataset is nothing for us," Diamandakis said, adding that the firm is currently migrating 13 petabytes of data for a telecommunications client into a multicloud environment.
Nearly all of WANdisco's customers, including KOBIC, are looking to move to real-time predictive analytics rather than retrospective analytics, which relies on constantly updated data, according to Richards. E-commerce sites, for example, have long based product suggestions on previous purchases, which he described as looking back rather than looking forward.
"Real-time analytics reverses that. It's not a look back. It's a look forward," he said, based on searches on a specific site and even elsewhere on the internet.
"A batch-based look back is not going to solve a problem. That is not a technique that data analysts want to use today," Richards explained. "It means that the infrastructure that most people have built isn't fit for purpose, which is why it has to move to the cloud."
He said that organizations simply cannot run artificial intelligence and machine learning on in-house infrastructure because those processes require intermittent access to thousands of CPUs. The same principle applies to COVID-19 research, like what KOBIC is supporting, he added.
Now that KOBIC is on board, WANdisco sees more potential in genomics and biotech. According to Richards, the company has seen interest or has even started pilots around drug discovery with several large pharma companies, which he declined to name.
Overall, WANdisco is trying to become a utility of sorts for data management. In utility computing, compute time is the biggest cost, he explained. "A marginal change in the unit price of compute, just like a marginal change in the price of oil, massively impacts businesses," he said, which led him to believe that futures markets will develop among cloud providers. That will require arbitrage, a market that WANdisco would like to be part of. "The end game for us is to be the broker that arbitrarily moves that data between clouds," he said.
Richards added that "virtually all" pharmaceutical and biotech firms would have to pay close attention to cloud arbitrageurs because the cost of compute time is "going to be a significant risk on the balance sheet."
That means that users will likely have data in multiple clouds simultaneously, and companies like WANdisco will have to ensure that all of its customers' datasets are up to date, regardless of location.