NEW YORK (GenomeWeb) – Researchers at EMC's research and development center in Russiahave developed an on-premise cloud appliance that provides local compute power and resources for omics data management and storage.
Andrey Pakhomov, a senior solutions manager at EMC R&D in Russia, presented the platform at last month's Bio-IT World Conference in Boston, noting that the EMC platform was designed to address common IT problems that genetics labs face. Researchers in these contexts often have to manage lots of lab equipment, collaborate with geographically distant partners, and handle information from multiple patients and experiments, he said.
There are also challenges with storing and properly managing the datasets, including ensuring that the right systems and people have access to the right datasets, as well as challenges with processing data in a timely and cost-effective fashion, Pakhomov told GenomeWeb after the conference. Furthermore, labs must also find ways to merge legacy tools and workflows with newer systems and solutions as these come on the market.
EMC's offering, he said, can handle these computation needs, and can scale as needed to meet customers' compute demands. Customers can install bespoke analysis pipelines and algorithms to fit their respective use cases and needs. They can also store both raw data and metadata in the system and easily implement internally built analysis pipelines and workflows. The system also supports secure collaborations, allowing users to restrict access to shared datasets. Moreover, it offers an on-premise alternative for customers in countries like Russia who are unable, by law, to use public cloud solutions to analyze potentially sensitive personal information. It's possible to create a hybrid version of the cloud that provides both on- and off-premise options if that's what customers want, Pakhomov said.
EMC currently has a limited deployment arrangement with Parseq Lab, a Russian genetic diagnostic company, which has coupled the EMC infrastructure with internally developed algorithms and pipelines to analyze data from VariFind, a validated next-generation sequencing-based test for screening newborns for genetic diseases. Pakhomov told GenomeWeb that for now EMC plans to continue testing the platform with local Russian medical centers, using a few pipelines and tools to evaluate its efficacy in real-world settings..
Parseq (previously Sequoia Genetics) first announced in 2012 that it would use infrastructure developed by EMC to power informatics tools it developed for personalized medicine.
Earlier this year, Parseq announced that it received financing from the St. Petersburg Ministry of Health in Russia to offer VariFind. The test screens newborn children with positive biochemical tests for cystic fibrosis, galactosemia, and pheynylketonuria. VariFind includes a diagnostic assay based on targeted enrichment technology, an internally developed mutation database of pathogenic variants associated with severe cases of the conditions, and software for processing and visualizing sequence data as part of the analysis process.
In his Bio-IT presentation, Anton Bragin, head of the bioinformatics department at Parseq, said that when considering the informatics infrastructure for VariFind, Parseq sought a solution that would make it possible to integrate high-performance technologies into routine clinical workflows, provide access to current and relevant datasets, and organize tools, protocols, and data.
In addition to those benefits, Bragin told GenomeWeb following the meeting that the EMC infrastructure provides useful software services that help users customize and create applications specific to their needs. The EMC system handles all the lower-level functionalities such as data and metadata storage, tracking, and extraction, freeing bioinformaticians to focus on developing analysis applications. Algorithms and pipelines can be easily implemented in a number of ways, including packaging them in docker containers and uploading them to the platform, he said. The system also features a simple user-friendly interface through which clinicians can access and review data from the VariFind test. In addition, since the system is local, users don't have to tax their sometimes limited bandwidth trying to transfer large quantities of data to web-based clouds, he said.
Moreover, the platform isn't limited to just NGS applications, Bragin noted. Parseq is already exploring the potential of using the system to power applications for human remains identification and to manage biobanking data and samples.