NEW YORK (GenomeWeb) – Analytics Engines and Almac Diagnostics have been awarded an Innovate UK grant for an undisclosed amount to jointly develop a computational platform that automates pipelines and workflows for biomarker discovery, drug recovery and repositioning, and companion diagnostics.
The grant is part of a joint Innovate UK Biotechnology and Biological Sciences Research Council initiative that is investing £2.5 million ($3.8 million) in feasibility studies that aim to exploit the commercial opportunities of omics, systems biology, and other kinds of data.
For their particular project, Analytics Engines and Almac will combine their respective technologies and areas of expertise to design a scalable platform that's suitable for analyzing large, complex datasets. Analytics Engines develops storage and processing infrastructure for mining and extracting information from data in the life sciences and other contexts. Almac, for its part, is contributing its expertise in discovering, developing, and commercializing biomarkers, as well as developing bioinformatics pipelines.
The so-called Analytics Engines Big Data stack platform leverages technologies such as Hadoop HDFS and MongoDB to provide users with an efficient and scalable solution for combining and running automated pipelines on various types of data from both public and private repositories. The system uses data virtualization technology that lets users "look at data [from disparate sources] as being in a single location and to do cross comparison using multiple data types and tools," Austin Tanney, the company's head of life sciences, explained to GenomeWeb. "Processing and the data are kept close together so that rather than moving the data to the processing you are moving the compute to the data." The company markets both on-premise and on-cloud options of the system to customers who come from a range of industries in addition to life sciences. It does not disclose pricing.
Almac chose to work with Analytics on the Innovate UK grant because of an existing partnership between the two companies that began a few years ago. "[Analytics Engines] were doing some work in ... accelerating software and big data analytics and because of the work we do in diagnostics dealing with large data, it was a bit of a natural fit for them to approach us as to ways they could potentially help us," Tim Davison, Almac's vice president of bioinformatics and biostatistics, told GenomeWeb.
As part of that initial partnership, the companies worked to identify bottlenecks in Almac's internal processes that if automated would speed up the company's analyses. Specifically, they looked at the processes that Almac uses to subtype high-throughput molecular data as part of its efforts to develop tests for conditions such as ovarian and breast cancer, according to Davison. Historically, subtyping tasks could take anywhere from 40 to 80 hours of compute time but with Analytics Engines' help, Almac was able to cut that time to about four hours, he told GenomeWeb. "They vastly improved how we were able to turn around results and changed the cycle that we have for making decisions, [which] was a very big part of establishing the relationship between the two companies."
Bolstered by their success, the partners looked for other Almac processes that they could try to improve. Part of Almac's business involves identifying biomarkers that in some cases are involved in multiple diseases for example, a biomarker that affects multiple cancer subtypes. The company needed a way to more efficiently query its biomarkers of interest against public and proprietary datasets to identify different diseases that these markers might be involved in or potentially identify different prognostic groups. They wanted to be able to do this both prospectively — for instance, identifying potential dysregulation that occurs at the phenotype and genotype level — and retrospectively, so looking at failed drug trials, for example, and evaluating whether or not a different set of patients should have been selected for the study.
With the aid of the Innovate UK grant, "we are going to be really scaling out the process that we use for discovery and ... basically using it to generate data-driven hypotheses which you can then use to direct and focus your research and initiate collaborations potentially with companies or recover assets with other companies," Davison said. They'll also work on automating Almac's processes "to the point where the data is coming to us and telling us where the potential opportunities are for diagnostics, drugs, and companion diagnostics, and drug discovery, as opposed to us having to go forward and look and see where these might be," he said.
Currently for a biomarker discovery project at Almac, researchers might, among other tasks, have to gather data from different sources, perform molecular subtyping, query biomarker tests against the data, and look for relationships between individuals in a group such as similarities in signatures, phenotypes, outcomes, and clinical information. Completing these tasks could take researchers as much as a week, according to Davison. This partnership aims to reduce those times. Basically, "it's taking a lot of what we would do, removing manual steps and just introducing scale and consistency, something [for which] will have an audit trail, [and] something that could be saved in an effective way so that it could be queried at a later date without rerunning." Moreover, different bioinformaticians within the company may have different ways of doing things, so to ensure consistent results, part of the focus will be on putting together standardized ways of performing tasks and running analyses, he added.
Over the next 18 months, the researchers will work on implementing Almac's internally developed pipelines and workflows on Analytics Engines' infrastructure. This includes pipelines for data quality control and for exploring technical factors and endpoints, Davison said. They'll also implement Almac's pipelines for cleaning data, a molecular subtyping workflow which uses unsupervised approaches or semi-supervised approaches to explore data structure with respect to sample or patient grouping, phenotypes, ontologies, and targets. Additionally, the partners will implement a tool that lets research discover and develop biomarkers that have clinical utility, statistical validity, biological relevance, and are analytically stable.
The platform will be used to support Almac's internal research and development efforts as well as fee-for-service work focused on biomarker discovery and validation that the company does with pharmaceutical and biotechnology companies, Davison said.
Analytics Engines also plans to offer capabilities developed through the partnership commercially as part of its existing portfolio specifically for the life sciences, Tanney said. "At the moment, we can implement pipelines for people or if they have the experience and expertise themselves, they can use the stack and implement pipelines. ...What we will have as a result of this is something that enables clients to build their own pipelines more easily," he said.