Skip to main content
Premium Trial:

Request an Annual Quote

Isilon Says New Storage Systems Meet Data Challenges of High-Throughput Genomics

Premium

By Uduak Grace Thomas

EMC subsidiary Isilon this week launched two network-attached storage hardware platforms in a bid to address the challenges of so-called "big data" environments, which are generally on the order of tens to hundreds — and even into the thousands — of terabytes.

Company officials said the systems should be of interest to customers in the life science market who are looking to store north of 20 terabytes. Life science currently represents Isilon's second-largest vertical market, behind media and entertainment, and the firm claims approximately 100 customers in the life science sector.

The first new product, the Isilon S200, is a high-performance system that provides more than 1.4 million NFS ops and more than 85 gigabytes per second of aggregate throughput from a single file system. It also includes Intel's Xeon 5600 quad core processors with STEC SSD and Hitachi SAS drives, as well as 10-gigabit ethernet front-end networking and 13.8 terabytes of globally coherent cache.

The second product, the Isilon X200, was designed to provide a blend of lower price and performance, and offers users more than 30 GB per second of aggregate throughput. Users can customize the platform to meet their unique workflow requirements.

Isilon said x200 allows customers to choose from a range of drive configurations, including solid-state drives with either serial-attached SCSI or serial ATA drives, and combine them with a large, globally coherent cache and next-generation quad core processors to deliver optimum price performance for accelerating big data access.

The S200 platform has starting price of $57,569 per node while the X200 begins at $27,450 per node.

The company has also released new versions of its OneFS operating system and SyncIQ replication software. OneFS 6.5 is standard with any S200 or X200 purchase, while SyncIQ 3.0 has a list price of $4,950 per node.

Scaling for Life Science Research

Isilon counts among its life science customers the Broad Institute, the Howard Hughes Medical Institute, Complete Genomics, Merck, and the Sanford-Burnham Institute, which recently deployed two 72NL-Series clusters at its campuses in La Jolla, Calif., and Orlando, Fla.

Eric Hicks, director of information technology at the Sanford-Burnham institute, told BioInform that the center focuses on a range of projects in cancer, pediatric disease research, immunology, infectious diseases, diabetes and obesity, neuroscience, aging, and stem cells that require some storage capacity, but the institute's bioinformatics and genomics research efforts generate the most information.

Hicks said the group talked to several vendors prior to selecting Isilon, but the firm’s storage tools proved to be an affordable, enterprise-level product that scaled to the institution’s requirements.

He explained that the cost of ownership of Sanford-Burnham’s legacy storage system proved too expensive and, as such, researchers resorted to unprotected resources like USB drives to store data.

Although the data isn’t uniformly spread across all the research areas, the combined data footprint for the center is about 40 terabytes, Hicks said.

Matthew Trunnell, manager of research computing at the Broad, told BioInform that the institute is currently testing one of Isilon's new offerings to support a computational workflow with an I/O footprint that isn't currently covered by its existing platforms.

The new system will add to seven petabytes of data that the Broad currently stores in Isilon systems.

Trunnell explained that the workflow in question is one component of a sequence analysis pipeline that reshuffles tens of terabytes of data coming off sequencing instruments in such a way that it can be accepted by downstream analysis tools.

The process requires "a tremendous amount of I/O in rapid sequence," such that when it's run on a large scale, it creates performance constraints.

Trunnell said the Broad first tapped Isilon for its storage needs several years ago because it "represented the most easily scaled platform" and was the easiest system to run. Currently the institute has about six Isilon clusters and Trunnell said the resource has kept pace with the Broad's growing data, which was increasing at a rate of 250 TB per month as of last fall.

Trunnell also noted that while Isilon's storage architecture is "well matched" for a lot of the computation that characterizes genomic work, there are some things it isn't built to handle.

For instance, he said while the X series and NL series clusters are "terrific" high-throughput boxes, they are also relatively high latency devices so they would not be suited for interactive workloads. Furthermore, they are also not well-suited to feeding large single clients.

Competitive Environment

At the Bio-IT World conference in Boston this week, Isilon was peddling its wares alongside competing storage vendors Data Direct Networks and BlueArc. Sam Grocott, Isilon's vice president of marketing, told BioInform, in an interview conducted earlier, that the firm is also seeing increasing competition from "build-your-own solutions," where researchers are purchasing hardware and laying on open source or third-party software file systems.

However he added, these types of solutions typically prove to be too complex for the average life science researcher, leading them to seek solutions that are more turnkey, such as Isilon's offerings.

The company is also facing competition from HP and IBM, who have recently moved into the scale-out storage market. IBM launched its Scale Out Network Attached Storage, or SONAS, platform early last year, while HP purchased storage firm IBRIX in 2009 to extend its scale-out storage capabilities.

According to Grocott, another competitor, Panasas, offers "a true scale-out storage solution" that's similar in architectural design to Isilon's offerings and differs only in hardware architecture implementation. The Panasas system is blade-based while Isilon's is an appliance node-based system.

Grocott noted, however, that Panasas requires users to deploy software on all clients, which he described as its biggest hurdle.

Nick Kirsch, Isilon's director of product management, added that the Panasas solution is more appropriate for organizations such as nuclear physics labs that want maximum performance from their systems and are willing to pay extra for it.
He noted that life sciences researchers tend to opt for good quality performance at a lower cost.


Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.

The Scan

Shape of Them All

According to BBC News, researchers have developed a protein structure database that includes much of the human proteome.

For Flu and More

The Wall Street Journal reports that several vaccine developers are working on mRNA-based vaccines for influenza.

To Boost Women

China's Ministry of Science and Technology aims to boost the number of female researchers through a new policy, reports the South China Morning Post.

Science Papers Describe Approach to Predict Chemotherapeutic Response, Role of Transcriptional Noise

In Science this week: neural network to predict chemotherapeutic response in cancer patients, and more.