This week, Terascala said that it has released a new version of the Intelligent Storage Bridge (ISB), a high-performance data-movement appliance which the company first introduced last fall to help its customers, especially those in the genomics arena, move large quantities of data quickly and efficiently.
Terascala's ISB supports Lustre storage solutions provided by various vendors thus allowing users to move data to and from scratch storage on a wide range of high-performance computing and enterprise storage solutions. The latest version of the software, according the company's website, includes application programming interfaces for integrating and managing third-party workflow management solutions or internally written scripts and applications. It also includes gateways that allow customers that use the Common Internet File System (CIFS) and the Network File System [NFS] to access their scratch storage data. It also includes new security protocols to ensure that the data is protected as it moves, Steve Butler, Terascala's CEO, told BioInform.
In addition to the ISB, Terascala also sells a high-performance storage appliance. Both of its solutions are sold separately although they can be purchased together. Entry level pricing for the ISB starts at around $50,000 for a fully redundant four-server system including the Terascala operating system software that has the ability to move data at a one gigabyte per second, Butler said. According to technical specifications provided on the company's website, a minimum base configuration has three nodes, 48 gigabytes of memory, and 500 megabytes per 1U active node. Entry level pricing for the company's high-performance appliance starts at around $200,000, which covers 200 terabytes of usable storage and runs at about six gigabytes per second, Butler said.
Terascala is an OEM company so its solutions are sold through its partners Dell and NetApp. According to Terascala's website, Dell's version of its appliance uses Dell PowerEdge R620 servers and Dell PowerVault MD3260 and MD3220 storage arrays in fully redundant configurations. The NetApp/Terascala High Performance Storage Appliance has similar infrastructure except that it uses NetApp 2600 and 5460 storage controllers also in fully redundant configurations.
Boston-based Terascala was founded in 2005 and has raised more than $30 million in funding including investments from Ascent Venture Partners and Intel Capital. Terascala's products are sold to customers in finance, life sciences, manufacturing, media and entertainment, and in the oil and gas/energy business. Its solutions are used in places such as the US Environmental Protection Agency, Sandia National Laboratories, and Tradeworx, a financial technology company. However the bulk of its business comes from the life sciences with 100 percent of that coming specifically from customers involved in genomics-based activities who are finding that their existing solutions cannot handle increasing quantities of genomic data, Butler told BioInform.
A recurring theme from current customers, is that "we need to have the fastest possible storage so that we can enable our bioinformatics folks to run their applications with massive datasets but its got to be [easy to use and] its got to be plug and play" not require a large IT team to maintain and run, he said. "Our solution fits very nicely into that critical requirement, so we look at genomic sequencing as a really great fit [and] we see it through the pipeline of opportunities that are coming up right now."
Aside from a "competitive price point," Terascala also believes that its patented ISB solution is unique to the market and that when it is purchased as a bundle with the company's HP storage appliance, the two systems provide a solution that is unlike any currently available in the HPC arena, Butler said. Furthermore, "we've taken an open source [Lustre] file system that’s the fastest in the world and we've turned [it] into an appliance so that companies benefit from open source" and at the same time designed a system that’s easy to use, he said. "That’s our secret sauce."
Customers have already begun using the updated version of the ISB. One of these is the Translational Genomics Research Institute, which was one of the first to deploy the previous version of the ISB when that became available last September. TGen also uses Terascala's high-performance storage appliance and a HPC compute cluster that Dell manufactures and sells.
TGen first learned of the Terascala solution through its partnership with Dell, James Lowey, VP of technology for TGen, told BioInform. The institute was updating the Dell compute cluster that it was using to support a pediatric neuroblastoma sequencing project and was looking for a solution to shorten the time needed to run the computational requirements of the pipeline it was using for the project — moving data being one of the bottlenecks.
Lowey explained that TGen worked with Terascala to develop a CIFS gateway that would make it possible to move sequence data directly from the sequencing instrument into TGen's Lustre file system, a step that helped save some time. However, "we didn’t have budget to buy a multi-petabyte Lustre system so the data could only reside on [our current] Lustre system for a limited amount of time."
Terascala's ISB appliance provided a more cost-effective solution to the problem by providing a way to efficiently move data across multiple storage tiers.
"We just incorporate[d] ISB into the flow and it can push data automatically onto data archive system or long-term storage," he said. "And having the ability to scale horizontally really makes it compelling because it's capable of pushing a lot more bits through than just running a tradition CPU over NFS that we would have been doing."
The ISB has been in place for about six months and — along with the HPC system from Dell — has helped TGen cut the amount of time required to run its computations from seven to nine days down to between four to ten hours, according to Lowey. Furthermore, it helps to have a commercial partner who can handle system maintenance, updates, and repairs, he added, thus freeing TGen employees from having to take on that responsibility. "Vendors like that to give you that ability to sleep better at night," he said.
Dell has also incorporated the ISB appliance into its Genomic Data Analysis platform, a new high-performance computing system it developed in partnership with TGen and Terascala that is optimized to meet the performance and storage requirements of the genomics research market. It doesn’t include specific genomic data analysis software but it is designed to provide everything needed to run bioinformatics pipelines in a single rack, Walker Stemple, Dell's HPC solutions product manager, told BioInform. The company launched the system in the US last June at the Dell World Exposition under the moniker Active Infrastructure for HPC Life Sciences but rebranded to make its purpose clearer, he said. It will be demonstrating the solution at other bioinformatics conferences later this year including the annual Bio-IT world conference in April.
According to a whitepaper from Dell, the system's components include Dell PowerEdge R620 servers, a Dell PowerEdge R820 — a so-called fat node that provides space for running memory intensive applications — and Dell PowerVault MD3260 and MD3220 storage arrays. It has a 360-terabyte Lustre file system, 180 terabytes of NFS-accessible storage that serves as the primary storage unit for directories and application data. It lets users move sequence data directory from their instruments in the Lustre file system for processing using a CIFS gateway before the data moves to the primary storage space for analysis. According to the company, the 32-node cluster system can process up to 37 genomes per day. That figure is based on benchmark tests using the bcbio-nextgen analysis pipeline that Dell did in partnership with researchers at Harvard University.
Dell has already begun shipping its platform — its first system specifically tailored and packaged for the life sciences market — to clients such as the National Cancer Institute. This is the same system that TGen uses for its sequencing projects, as well. Entry pricing for the system can start as low as $350,000 but that could change depending on what the specific configurations are that the user wants or needs. As such, "it's best to contact Dell sales for a customized quote," Stemple said.