Skip to main content
Premium Trial:

Request an Annual Quote

New Genestack Modules Address Omics Data Management, Gene Expression


CHICAGO (GenomeWeb) – Six years after Genestack first appeared on the scene by winning a Pistoia Alliance competition to build a platform for analyzing and storing next-generation sequencing data, the British company is addressing data fragmentation and inefficiencies in applying genomics to drug discovery.

At the 2018 Bio-IT World Conference in Boston last month, Genestack introduced two new modules for its bioinformatics platform.

One module is an omics data manager that facilitates data re-use and meta-analysis for collaboration on omics-related studies, while the other, a gene expression data miner, is meant to help researchers query and visualize transcriptomics data.

"The platform is an infrastructure product that allows large-scale R&D organizations to really get control of the data," said Genestack CEO Misha Kapushesky. "These two modules, one of them comes in at earlier stages, and the second one comes in [during] the report generation, the later stages of data interpretation" for data mining, he explained.

"The second [module] is for end-user biologists. Once they've collected an interesting set of transcriptomics data, they want to know what's going on," Kapushesky said. The data miner helps them determine what might be a good indication or drug target to pursue.

Kapushesky said that both of these modules have been developed in collaboration with big pharma customers, including Roche.

Kapushesky helmed the functional genomics group at the European Bioinformatics Institute in the 2000s. There, Kapushesky started and led the Gene Expression Atlas, which is now part of the Open Targets initiative, a joint project of EBI, the Wellcome Sanger Institute, GlaxoSmithKline, Takeda Pharmaceuticals, and Celgene.

Previous Genestack modules addressed transcriptomics. "Now we're adding to that additional omics data types," including variation and flow cytometry data, Kapushesky said. "We're looking at [a future involving] proteomics. We're looking at single-cell RNA sequencing."

The new omics data manager allows pharma companies and other biomedical researchers to capture and visualize samples and studies from multiple sources. "One of the big changes that's happening in the industry is that nobody does everything in house anymore. It's a major shift. You're seeing more and more collaboration and partnerships," Kapushesky said.

Particularly in pharma R&D and translational units, laboratory information management system and electronic lab notebooks have not been designed for such distributed data environments, according to Kapusheskty. 

"For those guys, it's a real challenge. How do I coordinate everything while, at the same time, giving my scientists within my organization really good visibility of the data that I have so that I can become a lot more productive?" he said.

"If you are in a big therapeutic area and you want to know what data do we have in the organization on male patients over 50 with a particular variant and with a particular gene expression profile, how can I very quickly get that?" Kapushesky said.

Ordinarily, it takes time to compile all this data. "When you are designing a new experiment, you often don't know if this experiment has already been done within your organization," Kapushesky said.

That's where the data manager module comes in, addressing the issue of managing data at scale, according to Kapushesky.

"Today, it's not just managing your internal organizational data. It's your collaborations, and 'collaborations' means a whole lot of things right now," he explained.

"When you collaborate with UK Biobank or you collaborate with Genomics England, now you have to manage 100,000 samples. If you're collaborating with the Qatar Genome Programme, you got to manage millions of samples," Kapushesky said.

"You've got to build a system that is going to scale, that's going to provide you really good provenance and data governance features. It's got to be modular so you can add things to it." Now, a company like Cambridge, England-based Genestack also has to meet the European Union's General Data Protection Regulation, which took effect May 25, as well as other compliance requirements.

The Genestack platform can be either locally hosted or run in the cloud, so it integrates with both internal and external systems for scalability. "It's a modular set of architectural components for managing large-scale omics datasets, providing access to them with security in place," Kapushesky said. A series of application programming interfaces allows computational professionals to connect to various information systems. 

"We've taken the stuff that comes from the first module, all of the interesting data," Kapushesky said. "[We] process it, provide them with pipelines, provide the ability to hook in external pipelines, and then we provide a really nice, streamlined, beautiful interface where they can compose dashboards … and they get an immediate view on where each gene is active [in terms of gene expression]."

As it helps researchers identify and validate targets, the technology frees up computational biologists, who no longer have to answer questions that others now can look up because the information is organized.

Searches are reproducible. "You can have a saved view that will inform you if there is new data that has arrived," Kapushesky said.