Skip to main content
Premium Trial:

Request an Annual Quote

PetaGene Sees AstraZeneca Deal as Validation of Genomic Data Compression Software

Premium

CHICAGO – PetaGene has recently expanded its global reach and demonstrated the value of its PetaSuite genomic data compression software, inking a pair of distribution deals in Asia and beyond in the last month or so and lining up a major pharmaceutical company as a client.

Earlier this month PetaGene announced  that AstraZeneca had chosen PetaSuite to compress genomic datasets for the drug company's Centre for Genomics Research, which is seeking to analyze 2 million genomes by 2026. The Centre for Genomics Research has processed about 200,000 datasets so far, generating more than a petabyte of data in the process.

Cambridge, UK-based PetaGene said that its PetaSuite lossless compression software can reduce storage costs and data transfer times by 60 percent to 90 percent over BAM and gzipped FASTQ files. The technology integrates with existing bioinformatics pipelines and storage infrastructure, including cloud platforms, according to the company.

The Global Alliance for Genomics and Health said that its CRAM file format has stored four petabytes of compressed genomic data in the last nine years. Chief Commercial Officer and Cofounder Vaughan Wittorff told GenomeWeb via email that customers have purchased enough PetaGene services in just the last 12 months to compress 21 petabytes of BAM and gzipped FASTQ files.

PetaGene expects its software to enable AstraZeneca's research center to achieve lossless compression of its files of about 76 percent, reducing transfer times by a similar amount and effectively quadrupling storage capacity. AstraZeneca will be able to compress more than 200,000 BAM files in 24 hours with the help of PetaSuite, the software vendor said.

Senior Vice President Michael Hultner, who serves as the general manager for PetaGene's US operations, said the AstraZeneca deal represents the first use of PetaSuite at that kind of scale.

"It's proven that we can work at that scale and we can also work in a production environment where we don't interrupt any of the normal flows or uses of the data," Hultner said. "It's a validation of our technology and of our product. We're offering them a significant compression advantage."

Notably, AstraZeneca got Secure Hash Algorithm 256-bit (SHA-256) certification on this technology, which is new for PetaGene, said Cofounder and CEO Dan Greenfield. SHA-256 is a validation protocol that is meant to guarantee the integrity of data as it is compressed and then expanded.

PetaGene, founded in April 2016, had previously validated PetaSuite to an algorithm called MD5. Hultner said that SHA-256 essentially does the same thing, but by a different computing process.

"They wanted additional integrity checks … so that when you try and access the file, you get bit-for-bit the same file as you put in. They wanted additional validation for things like legal compliance," Greenfield explained.

AstraZeneca also deployed PetaSuite on the Amazon Web Services cloud to "parallelize" the compression technology, Hultner said. This is what allows the drugmaker to process so many files per day.

Hultner said to expect a case study on this installation in a few weeks.

Earlier this year, the company introduced an encryption and access-management system called PetaSuite Protect, a product that was named "best of show" at the 2019 Bio-IT World conference in April. Featuring Advanced Encryption Standard 256-bit (AES-256) encryption, PetaSuite Protect gives data managers the ability to grant access to specific regions of genomic data files, rather than opening up whole files to all users.

Greenfield said that historically, a data steward would transfer an encrypted file and send along a decryption key, but not know what others were doing with the entire file. PetaSuite Protect has a "transparency layer" that requests a key for specific regions of the file for specific usages and monitors access and usage.

"The data steward has a complete audit trail as to the activity on there," Greenfield said. He expressed dismay that this has not become standard practice in genomic data management, but suspected that the technology had been lacking.

"Even if someone were to copy the file and try to use it later, if our library is in place, the data owner would be aware of that," Hultner said.

Greenfield said that this process is "transparent," meaning that end users with a decryption key will receive a regular BAM file, but someone with restricted access will only see certain regions of the genome. The rest will be blank.

He said that this technology helps organizations comply with privacy and security standards including HIPAA in the US and the General Data Protection Regulation (GDPR) in the European Union, as well as to comply with CLIA standards.

Meanwhile, PetaGene continues to move into other geographic markets. Indian startup Genique Lifesciences in late September became the exclusive distributor of PetaGene's PetaSuite genomic data compression software in India. In mid-October, PetaGene named Dubai-based Alliance Global as the exclusive distributor of PetaSuite in the Middle East, Africa, Central Asia, and the South Asian countries of Pakistan, Bangladesh, and Sri Lanka.

"For some of these, we've made sales in the market previously, but we realize that it's best for a distributor to take over the customer relationships there and expand sales moving forward," Greenfield said.

Wittorff said that that there were other territories, including Japan and South Korea, where PetaGene has done business but is in the process of selecting exclusive distributors.

Financially, Greenfield said that the company was not yet profitable, but is "approaching profitability." 

The company late last year closed a $2.1 million round of financing, bringing its total venture capital haul to $3.2 million. PetaGene now has about 17 full-time employees and is hiring.

Greenfield said that management actually accepted less money than investors had offered, but the company would like to control its growth.

"We are expanding and growing, and that increases our costs. Our investors would prefer us to just keep growing and expanding as much as possible and accepting as much money as possible," Greenfield said. "We don't necessarily see it that way."

The Scan

For STEM Students to Stay

New policy changes will make it easier for international STEM students to stay in the US after graduation, the Wall Street Journal reports.

To Inform or Not, To Know or Not

The New York Times writes that some genetic biobanks may re-contact donors if they spot something troublesome, but it notes that not all donors want that information.

Rapid Test Studies

Researchers are examining why rapid tests may be less effective at detecting the Omicron variant and how to improve them, NPR says.

PLOS Papers on SARS-CoV-2 Diversity in Delaware, Metastatic Breast Cancer, Adiposity GWAS

In PLOS this week: genomic analysis of SARS-CoV-2 isolates from Delaware, gene expression and protein-protein interaction patterns in metastatic breast cancer, and more.