As the importance of high-content cellular screening for drug discovery becomes more apparent, companies utilizing such an approach are faced with a crucial question: What to do with all that data?
Cellomics and EMC began exploring a partnership last August to answer this question, which cul-minated in a December deal. Last week, Roche signed on as the first pharmaceutical company to use the combined Cellomics-EMC data management system.
Cellomics, of Pittsburgh, Pa., markets the ArrayScan platform, which is used by several major pharmaceutical companies for high content image-based cell screening for drug discovery and development. The partnership that Cellomics forged with EMC of Hopkinton, Mass., made its ArrayScan compatible with EMC's Centera content addressed storage device, so that Cellomics customers could better manage the plethora of data generated by the ArrayScan platform.
The ArrayScan is equipped with its own data management tools, but the company quickly saw that the amount of data being generated by its users was too much to be handled by the platform. "We have a large customer base, and it's a fairly mature customer base from a high-content screening perspective," said Mark Collins, Cellomics' senior product manager. "Many of those customers were hitting the wall with respect to storage. Our customers were saying that we needed a new solution."
One of those customers was Roche, which, according to Collins, has been an ArrayScan customer for several years. Collins told Inside Bioassays that "Leon Garfinkel, [director of research IT] at Roche approached me last year and said: 'We're looking into expanding high-content screening. Where are you going with high-content informatics? Take a look at EMC Centera.' And I had already begun to take a look," Collins said.
EMC offers products and services for information storage and management for several industries, but its Centera system is particularly suited to storage and management of digital fixed content such as x-rays, electronic documents, and check images. In this case, Roche and Cellomics both realized that such a system would be ideal for fluor-escence microscopy-based images and related data.
Pharmaceutical companies have been seeking informatics solutions for years, but especially since 1997, when the FDA established 21 CFR Part 11, outlining the agency's requirements for electronic records to be essentially equivalent to paper records.
Companies have found it challenging enough to comply with FDA regulations in this area while using biochemical-based high-throughput assays, but the ArrayScan is one of several instruments on the market that analyzes living, intact cells. The data generated by biochemical assays — usually one or a few data points per well — is small potatoes compared to that generated by image-based cell assays.
"With image-based assays, you can get hundreds of points of data per cell per well," said Judy Masucci, Cellomics' director of marketing. "And you might have 100 or more cells per well."
What this translates into, she said, is that in biochemical assays one might generate data in the mega- to gigabyte range, while with cell-based assays one will generate multiple terabytes of data.
Previously, users of high-content imaging systems would have to manage that data themselves, usually storing it on "piles of DVDs or CDs," collins added. The problem with such an arrangement is that 21 CFR Part 11 requires that high-content screening data supporting a new drug candidate must be recalled quickly and accurately. If a company needs to find, for example, all the data associated with toxicity from the type of secondary screening commonly performed by ArrayScan, a more sophisticated data management system is necessary.
"What you don't know is how far the FDA will reach back into research data when you do a new drug application," Collins said. "It may come along and say: 'I'd like to see your high-throughput screening results, please.' Right now, there's no official word on whether the images are really the raw data, or whether we can use an algorithmic trans-formation of it, but right now, no one's taking any chances."
The ArrayScan database system, called Store, provides some help in this area by automatically organizing and storing data. "But we don't store the images for very good performance reasons, and even then you would still have a lot of discs that you'd have to put the image-based data on," Collins said. "There would still be some IT overhead to manage that file storage or image storage."
Collins hopes that the Centera system may provide its customers some peace of mind in this area. "We can show with the EMC Centera system that data has never changed, and there's a clear audit trail of when it was done and who did it" he said. "And that audit trail is an immutable piece of information — it gets tagged to the image."
Collins also said that Roche’s Nutley, N.J.-based division will be the primary user of the data storage platform, with the expectation that if things work out, the company may adopt the system worldwide for storing high-content screening data. He also said that partnerships with other pharmaceutical comp-anies are in the works.
“Centera is actually EMC’s fastest-growing ever product line,” he said. “We’ve gone and done a couple of joint presentations, and there is another pharmaceutical company looking at this.”