NEW YORK (GenomeWeb) – The UK's Genome Analysis Center is partnering with Optalysys to develop a commercial platform for searching DNA sequences that will be based on optical processing technology developed by the Yorkshire, UK-based firm.
The partners have received £0.5 million ($769,000) from Innovate UK to develop the so-called Genetic Search System (GeneSys), which is expected to go on sale in the next two years. The system will couple Yorkshire, UK-based Optalysys' optical technology with the well-known Blast software to enable researchers to run large-scale DNA sequence searches efficiently and in a more cost-effective manner than is possible with existing high-performance computing resources.
Some of those cost savings will come from the reduced capital and energy requirements of the planned infrastructure. Instead of electricity, Optalysys' technology uses light to perform "processor intensive mathematical functions in parallel" at high speeds and resolution. As such, it can potentially provide "multi-exascale levels of processing" using power from a standard mains supply and without requiring special cooling infrastructure, according to the company.
Timothy Stitt, TGAC's Head of Scientific Computing, told GenomeWeb that researchers at the center routinely run sequence alignment operations using Blast on HPC resources installed at the center. While these resources are well suited to run large-scale Blast calculations, it is expensive to run and cool. In contrast, since GeneSys will run on standard electrical power supply, it will use much less energy than is currently required by HPC systems. By way of comparison, TGAC's existing HPC resources can consume up to 130 kilowatts of power including mechanically removing heat. With the GeneSys system, TGAC expects to reduce that number by 90 - 95 percent, Stitt said.
Furthermore, the planned system, which will be "at least as accurate as current systems," will be much smaller and cheaper than existing systems, according to Optalysys CEO, Nick New. The reduced cost and size, he added, will bring "the ability to perform this kind of analysis into the hands of a much broader base of companies and institutions who previously were unable to do so due to capacity constraints and prohibitive running costs."
TGAC was also drawn to the fact that Optalysys believes that it can deliver an exaflop machine that will still run on standard mains supply power within the next five or six years, Stitt said. According to its site, Optalysys expects to launch its first products for the market in 2017 — it launched a Series A last month to support its product commercialization activities over the next two years. Its product portfolio will include model simulation units with an initial specification of nine petaflops that will be upgraded to deliver multiple exaflops by 2020. The company also intends to provide so-called big data analysis units by 2017 that will have an initial specification of 1.3 peteflops but will be expected to deliver several hundred petaflops by the year 2020.
Increasing performance, according to the company, will not affect the systems' power requirements, which would be a boon for genomics researchers. Matching input sequences to databases of nucleotides or amino acids is critical for assembling, annotating, and comparing genomes to better understand diseases such as cancer as well as identify and develop preventative treatments for plant pathogens, among other sorts of projects. As sequencing technologies evolve and the costs of generating data drop, public databases that hold this information double in size every 18 months or less and exploring these resources requires large HPC resources that consume vast amounts of energy for power and cooling.
Although its technology has been used in other fields for defense applications, this is the first time that Optalysys' technology will be applied to bioinformatics and for sequence searching specifically, according to the partners. Other potential applications for Optalysys' systems include use in studies of weather turbulence and ocean movements, according to the company. It could also be used in jet engine development, radiology applications, and more.
According to its website, Optalysys' technology relies on proprietary techniques as well as diffraction and Fourier optics to "combine matrix multiplication and optical Fourier transforms into more complex mathematical processes." It uses liquid crystal patterns, instead of lenses, to focus "the light as it travels through the system."
In the system, "numerical data is entered into liquid crystal grids and is encoded into the laser beam as it passes through. The data is then processed together as the beam is focused or passes through the next optical stage," the company explains on its website. "Increasing the resolution of the data is achieved through adding more pixels to the SLM, but the process time, once the data is addressed, remains the same regardless of the amount of data being entered."
Over the next two years, TGAC and Optalysys will develop and test a prototype of GeneSys on datasets contained in the Human Microbiome Project's Mock Community data, Stitt said. In the system, genomic sequences will be encoded as images in liquid crystal displays and specialized lenses will compare one image to another and identify differences, Stitt explained. The final output will be a similarity plot with peaks that indicate where corresponding images diverge from each other.
As part of the two-year development process, the partners will develop technology that will enable them to convert test datasets, reference genomes, and query sequences from the Fasta file format into images and back again after the analysis is complete, as well as compare Blast on GeneSys' performance with running Blast on typical HPC resources, he said. They'll also compare GeneSys' output to what would normally be obtained from running Blast on a typical HPC system to ensure that the results match up.
The initial goal of the project will be to develop a small energy-efficient coprocessor that could be connected to a standard compute node in a cluster, but ultimately the partners plan to develop a small portable device could be used in combination with portable sequencing technology, Stitt said.
Moreover, since "you can scale up this device very easily and cheaply just by adding more liquid crystal displays ... you can put as much data into this system as you want and it will do the comparison against everything all at the same time," he said. "You could literally fit many reference genomes in the device and also lots of query sequences and compare then all at the same time" with no additional processing time required. That's not true for traditional HPC resources, where "if you double the amount of data you need to process, typically you'll double the time it takes to process it," he said.