Skip to main content
Premium Trial:

Request an Annual Quote

SOLUTIONS: How Two IT Firms, a Philanthropist, and Wal-Mart Built a Better Mouse Database


TeraGenomics may well be the first life science informatics firm that has Wal-Mart to thank for its existence, but the ubiquitous bargain emporium played an important — albeit indirect — role in the formation of the company, a business unit of McLean, Va.-based IT consulting firm Information Management Consulting.

It all began when Salk Institute neurologist Carrolee Barlow and her colleague David Lockhart were working on a gene expression atlas of the entire mouse brain. They wanted not only to store the raw gene expression results; they wanted to house it so that it could be easily queried and retrieved before it ran through the higher-order analysis of GeneSpring and other downstream packages. In particular, the Salk researchers were looking for a way to collect and store quality control information about the chips. In trying to accomplish this task, they quickly ran up against the limits of the database architecture commonly used with enterprise gene expression software.

“A lot of people said they had the solution for it and I looked at all of them,” said Barlow. “Everyone said, ‘It’s so hard because you guys are talking about so much data.’ And after we’d gone through this with several different groups, David would make these jokes: ‘Wal-Mart has a lot of data, airlines have a lot of data, banks have a lot of data. We don’t have that much data. Why is this so hard?’”

Barlow estimated that her lab had about a terabyte of data on hand, and “we started to get into a situation where it was really becoming a critical issue.”

Coincidentally, Barlow had developed a friendship with Sudhakar Shenoy, the CEO of IMC. Shenoy had donated money for research on a rare neurodegenerative disease that Barlow studies called Ataxia-Telangiectasia. The two sat at the same table at a fund-raising event for the research seven years ago, immediately hit it off, and have stayed friendly ever since. Just as Barlow had reached the breaking point in her database problems, “I was having dinner with [Shenoy] and said, ‘What’s the name of your company again?’” Shenoy assured her that Information Management Consultants not only could help, but would be happy to do so, and directed her to IMC senior vice president, Gregg Wright.

During her first meeting with Gregg, Barlow said, “I’m describing what my problem is, and he says, ‘You know, this sounds very much like the problems that places like Wal-Mart have.’ And I just looked at him and said, ‘Okay, you get it.’”

After evaluating Barlow’s needs, Wright and IMC were confident that a solution could be put together, but recommended that the database should be built upon an architecture virtually unheard of in the life sciences market: Teradata, a data warehouse platform commonly used in banking and retail. Wright said that other databases, such as Oracle, are “satisfactory” for “small-scale operations…For very large data warehouses and high-throughput computations of the kind that we’re doing, we feel that this Teradata technology is a better data warehousing technology.”

IMC, Teradata, and the Salk researchers set up an agreement under which the two IT firms would donate their time and technology to build the database, while Barlow and Lockhart would contribute their life science expertise.

A year and a half later, the database is entering production phase and Barlow couldn’t be happier. Previously, her group was bound to a time-consuming three-step process: they first ran Affymetrix’s Microarray Analysis Suite to generate absolute and comparison files for the chips; then imported that data to a software tool they wrote called Bullfrog to filter raw chip data into lists of expression profiles and genes; and then moved that data to GeneSpring. “That’s fine when you have a few hundred chips,” said Barlow, “but when you start to get up to 1,000 chips and a web-based solution, we were having a problem.”

Now, the MAS tools and Bullfrog are incorporated into the database, “So we don’t have to move anything anywhere.” In addition, multiple laboratories and multiple scientists can collaborate in their research using the database over the web. Barlow is currently trying to secure funding to make the database publicly available so that neurologists worldwide will have access to the mouse brain atlas.

Meanwhile, IMC decided to put its newly gained genomics knowledge to work in the form of a new business division dedicated to building similar data warehouses for customers drowning in gene expression data. The division, dubbed TeraGenomics, is led by Wright, and operates under IMC’s life sciences division. Wright said that several other academic groups and a pharmaceutical company are currently evaluating the data warehouse.

While the company is happy to have people evaluating the technology, “We’re still a little early in our product cycle and we have been deliberately reluctant to go around and take out ads or talk it up,” said Wright. Barlow and her colleagues have several papers in press based on work that they conducted using the database, “So we’d like to let the scientific literature speak for itself,” he said.

— BT

Filed under

The Scan

Lung Cancer Response to Checkpoint Inhibitors Reflected in Circulating Tumor DNA

In non-small cell lung cancer patients, researchers find in JCO Precision Oncology that survival benefits after immune checkpoint blockade coincide with a dip in ctDNA levels.

Study Reviews Family, Provider Responses to Rapid Whole-Genome Sequencing Follow-up

Investigators identified in the European Journal of Human Genetics variable follow-up practices after rapid whole-genome sequencing.

BMI-Related Variants Show Age-Related Stability in UK Biobank Participants

Researchers followed body mass index variant stability with genomic structural equation modeling and genome-wide association studies of 40- to 72-year olds in PLOS Genetics.

Genome Sequences Reveal Range Mutations in Induced Pluripotent Stem Cells

Researchers in Nature Genetics detect somatic mutation variation across iPSCs generated from blood or skin fibroblast cell sources, along with selection for BCOR gene mutations.