Just when you thought the bioinformatics industry was about to choke to death on its own data — indeed, Craig Venter predicted last year that the industry would soon graduate from generating terabytes and petabytes to exabytes — it appears that most companies might not have to worry.
For example, MDS Proteomics is currently making do with nine terabytes of online storage split between its main data center in Toronto and a second lab in Odense, Denmark. Senior director of IT John Sulja expects to double that capacity in the next year, but at roughly half the cost of the initial storage installed in 2000.
Sulja says his company managed to sidestep a data storage nightmare partly through effective management. But he adds, “It’s also a matter of culture. Our scientists and our IT people interact on a daily basis. We know what’s going on, and we can react well ahead of a need arising.”
Similarly, Structural Genomix in San Diego has two or three terabytes of storage online, and another 10 to 12 terabytes available in archived tape backup. Chad Smith, the associate director for IT, says this capacity should be sufficient for at least the next year. While the volume is nothing to sneeze at, it’s hardly an order of magnitude beyond the industry’s current needs.
In an environment where the Food and Drug Administration mandates that every last bit of data be saved, a primary concern revolves around the stubborn storage hogs. Some people in the labs have made end runs around determined IT staffs by keeping files in e-mail inboxes rather than the shared network drives.
Smith says that for SGX the solution comes down to “the intelligent use of your storage. You don’t necessarily leave everything on disk. If you can figure out what can be compressed, and hold on to experimental results rather than the experiment itself, that makes a big difference.”
— Joseph Radigan