Skip to main content
Premium Trial:

Request an Annual Quote

BlueArc Optimizes Storage System to Create 'Recipes' for Use with Illumina Sequencers


By Uduak Grace Thomas

This article has been updated from a previous version to clarify a comment from Andersson regarding the company's servers.

BlueArc said this week that it has configured its network storage options to provide "complete compatibility" with Illumina's line of next-generation sequencers.

The company expects that the new configurations will allow its storage software to meet the demands that increasing quantities of sequence data and post-processing analysis place on storage infrastructure — particularly for shared storage systems.

BlueArc teamed up with Illumina in January of this year to run a series of tests combining its storage system with Illumina's pipeline applications. The tests were designed to outline a set of configurations required to optimize the software when combined with the sequencers.

According to Bjorn Andersson, BlueArc's solutions director for the life sciences market, the optimization was required to meet the increasing quantities of data generated from next-generation sequencers. This data is often stored in shared spaces, making it difficult for multiple users to access it simultaneously.

"Researchers sometimes are sort of the canary in the mine when they try to access the results of their own data in shared storage and they run into some kind of problem," he told BioInform. "We wanted to do everything we can to create 'recipes' for how to best use our equipment, that we can use both for our engineering when we design new products, but also to help customers with smaller installations get the most out of their installations."

Using real data from some of its customers, BlueArc simulated a typical research environment generating large quantities of data from multiple Illumina sequencers and then depositing the data into a communal storage space. The company "rented pretty good sized clusters, 120 nodes, each with four cores, that we were able to use to run the Illumina applications and basically recreate one of our larger customary installations with dozens of Illumina sequencers," Andersson said.

He said that several major BlueArc customers participated in the testing, but could not release their names.

According to Andersson, the tests revealed several points about the storage system. "We learned that on one hand if you go to the larger type of installations, they fit very well with our Titan or high-end type of our service," he said. "Starting with just a couple of storage server nodes and then scaling up we can do up to eight in a cluster."

In addition, the tests also highlighted the need for storage infrastructure that grows with research and makes managing large quantities of data easier with fewer disruptions. "One of the key things that we learned from it was that things change much, much faster than you can actually change the IT infrastructure," Andersson said.

He went on to say that tests run on BlueArc's software showed that it already has the necessary features built in to handle the rigors of changing workloads. One such feature is dynamic read caching.

"If you find yourself limited by the line rate speed of a 10-gigabit network link for read operations, we can turn on our dynamic read cache to automatically and dynamically aggregate the line rate performance over multiple cluster nodes. With our test setup we observe that we got close to linear scale-up," he said.

[ pagebreak ]

A Growing Market

The next-gen sequencing market stands to be a lucrative one for storage vendors as many labs are ramping up their storage to keep pace with new sequencing instruments. This week, for example, the Southwest Foundation for Biomedical Research said that it will increase its storage capacity from 50 terabytes to 500 terabytes in preparation for large-scale sequencing studies, while the Fred Hutchinson Cancer Research Center said it plans to expand its storage to "beyond a petabyte" (see stories here and here this issue).

BlueArc faces some stiff competition from other network storage providers like Isilon and Panasas, however. In the last two months alone, Isilon has announced agreements with China's BGI, the National Center for Genome Resources, and Mount Sinai School of Medicine's Genomics Institute — all to support next-gen sequencing instruments.

Chris Dagdigian, BioTeam’s founding partner and director of technology, who has worked with BlueArc, told BioInform that BlueArc's software configuration plan makes it easier for researchers to make informed decisions about which storage software to use.

"A couple of years ago, the instruments were putting out so much data themselves that you would have more storage supporting your instruments than you would have in the entire rest of your company," he said. "What's changed recently is that the instruments are not producing the vast amounts of data that they were or they're getting better at reducing that data into smaller bits before they send it off to the users."

According to Dagdigian, designing an appropriate storage system is relatively easy, but the difficulty for many researchers with limited IT infrastructure experience is deciding how much storage space they require. He attributed this to the ways which researchers now can manipulate and share data.

"Scientists are slicing and dicing the data into different formats. They are sharing it with collaborators, they are making mash-ups, and they are taking up tons of disk space," he said.

"I think in general anything that makes the system easier to design pre-purchase so that you buy the right pieces is better," he said. "People who buy solutions that are happy with them are going to talk about it and that's one of the things that BlueArc would benefit from."

Although Dagdigian declined to comment on storage software that he hasn't personally used, he said that he has used storage systems from both BlueArc and competitor Isilon and compares them favorably with each other. He said that the choice of software depends on what works best for the infrastructure of the purchasing organization.

"The way you would purchase and buy Isilon is slightly different from the way you would purchase and buy BlueArc," he said. "With BlueArc, you have a lot more options in terms of the choices of disc tiers and other things, so you have the ability to sort of build either slightly more customer complicated infrastructures or the ability to mix and match it to other competing needs at the same time."

Having options is beneficial if the purchaser understands both the scientific and IT requirements of the software, but, according to Dagdigian, that isn't always the case.

"Sometimes the purchasers are researchers who really don't have a lot of IT experience backing them up on their team, and then other times the purchasers are a corporate IT organization and they might not have a really solid understanding of the scientific requirements and what the instrument actually requires," he said.

He added that BlueArc and Illumina's efforts will help researchers and IT organizations navigate the complex process of selecting and designing software systems capable of handling multiple users. “[BlueArc and Illumina are] making it a little bit easier to get your head around all the different options that you have to put together and assemble,” he said.

Andersson said the company next plans to configure its software to work with other sequencing vendors. He said that the tests gave both his company and Illumina a better understanding of how their systems work and put them in a better position to meet customer needs.

In the meantime, Andersson says BlueArc will continue testing its software capabilities. "We've since built and are expanding general application characterization capabilities in-house for ongoing testing but may also decide at some point to rent a cluster again to do larger-scale testing," he said.

"We believe very firmly that to be successful in this area, you really need to understand the customer environment," Andersson continued. "With the effort and the investment we are making here, we are deepening our knowledge [about] what customers are experiencing when they set up sequencing environments and we are able to work together with customers and provide a much better solution."

The Scan

Panel Recommends Pfizer-BioNTech Vaccine for Kids

CNN reports that the US Food and Drug Administration advisory panel has voted in favor of authorizing the Pfizer-BioNTech SARS-CoV-2 vaccine for children between 5 and 11 years old.

Sharing How to Make It

Merck had granted a royalty-free license for its COVID-19 treatment to the Medicines Patent Pool, according to the New York Times.

Bring it Back In

Bloomberg reports that a genetic analysis has tied a cluster of melioidosis cases in the US to a now-recalled aromatherapy spray.

Nucleic Acids Research Papers on SomaMutDB, VThunter, SCovid Databases

In Nucleic Acids Research this week: database of somatic mutations in normal tissue, viral receptor-related expression signatures, and more.