Skip to main content
Premium Trial:

Request an Annual Quote

Data Explosion Begs Question of How to Manage It

SAN DIEGO, June 27 – Throughout the exhibit halls and panel discussions at Bio 2001, a question mark lurked behind talks of grand plans to harness genomics-oriented data: how to meet the computing infrastructure needs for collecting, managing, and storing the explosion of information.

"We’re about to hit a wall we don’t see," warned Lisa Kenney, senior director for strategic technology at Incyte Genomics.

On one side of that wall is the "old science" of analyzing single genes or working with incomplete information. The other side is the integration of a vast array of information beginning to pour in from expression analysis, proteomics, various genomics research and a more systems-oriented approach.

"By definition, we are still poking a stick at an organism [and] seeing if it moves or doesn't," said George Poste, CEO of Health Technology Networks. "We need to move from where we are now, the phenomenological era, to computational biology."

His message was echoed by many of the panelists at Bio 2001, who predicted data volume would accelerate from the tera- to petabyte level within two years, causing a major bottleneck.

"Getting multiple copies of petabytes [of data] in-house is not what you want to do," said Siamak Zadeh, group marketing manager at Sun Microsystems. It "will be a nightmare." Zadeh added that a systems approach would put still more strain on data management and storage, "pushing it up by orders of magnitude."

Douglas Dolginow, senior vice president of Gene Logic, predicted that genomics data would be integrated into clinical trials within five years. Arthur Holden, chairman of the SNP Consortium, said efficacy and safety information for certain drugs will emerge over the next three to five years via database mining and array technology. Francis Collins, director of the National Human Genome Research Institute, said further annotation of the human genome would take place over the next two years. All of this, of course, will generate… data.

"The scale of the growth is so rapid it is unlikely that people will be able to afford a computer [powerful enough]," said Jeffrey Augen, director of business strategy for life science solutions at IBM. "Smaller companies can't afford two teraflops, plus [there is] storage and back up." "I’m not sure how it will shake out, but it is the biggest challenge."

Augen suggested that a mix of hosting and grid computing might emerge as a workable model. Suns’ Zadeh also sees a place for hosting, and that its adoption "will be evolutionary rather than a huge shift.

"Life science and pharma have taken a wait and see attitude," Zadeh added.

One thing is clear, however. The emergence of a computing infrastructure to provide the access, analysis, management, and storage for the burgeoning genomics data will not only be good for academia, biotech and pharma, but also the life sciences divisions of companies providing computing technology.

"Data management is huge," said Augen. He pointed to one slice of the business – expression array data – to illustrate his point: "In the future, every time you go to the doctor, you will use a biochip. Now, I consume one every two years; I throw my PC away. More people go to the doctors than own a PC. The scale of the biochip [business] could exceed the scale of semi conductors."

With the biochips will come the need to search databases, of course, And store the information. And conduct further analysis. Getting to that point remains a question mark.

"People aren't making the investment in computational infrastructure," warned Poste. "No single company will be an island in this."

The Scan

Unique Germline Variants Found Among Black Prostate Cancer Patients

Through an exome sequencing study appearing in JCO Precision Oncology, researchers have found unique pathogenic or likely pathogenic variants within a cohort of Black prostate cancer patients.

Analysis of Endogenous Parvoviral Elements Found Within Animal Genomes

Researchers at PLOS Biology have examined the coevolution of endogenous parvoviral elements and animal genomes to gain insight into using the viruses as gene therapy vectors.

Saliva Testing Can Reveal Mosaic CNVs Important in Intellectual Disability

An Australian team has compared the yield of chromosomal microarray testing of both blood and saliva samples for syndromic intellectual disability in the European Journal of Human Genetics.

Octopus Brain Complexity Linked to MicroRNA Expansions

Investigators saw microRNA gene expansions coinciding with complex brains when they analyzed certain cephalopod transcriptomes, as they report in Science Advances.