The Fred Hutchinson Cancer Research Center’s new $4.4 million initiative to identify proteins in human serum indicative of early-stage cancer should add fuel to the fire below the field of protein disease marker discovery. It might also indicate the arrival of a new face in bioinformatics: Microsoft.
The heart of the project will be a large-scale human serum proteome database to capture, store, and analyze data. Researchers from the Hutchinson Center and Ruedi Aebersold’s group at the Institute for Systems Biology will work together to analyze samples and identify new protein markers by mass spectrometry. Then, together with Microsoft, they will co-develop the database, which will eventually serve as a GenBank-like public resource for proteomics data, according to Martin McIntosh, a Hutch biostatistician.
But first, they need to build it, which is where Microsoft comes in. Jim Gray, a relational database pioneer and senior researcher in Microsoft’s Bay Area Research Center — fresh from similar stints in other scientific domains — will lend his skills to the project. Gray will focus on optimizing Microsoft’s SQL Server database technology for the nuances of biological data, which is “much more complex than any other kind of data that I’ve had to deal with,” he says.
The early detection project should put Gray’s database skills to the test. As with microarray gene expression data, the correlation of proteomics data across multiple experiments at different locations is a bioinformatics nightmare. However, proteomics experiments present an additional obstacle over gene expression experiments, according to McIntosh: “With SAGE and cDNA arrays, you’re trying to measure something you’ve already identified — you know there’s a gene, and you’re trying to measure its expression. What we’re doing now is trying to identify what’s there, and then we can talk about combining measurements that quantify it. … So the challenge is combining databases that do both discovery and quantification at the same time.”
And Gray admits that Microsoft sees a real opportunity in optimizing its database technology for the bioinformatics market. “I don’t expect to see BioInfo 1.0 as a product from Microsoft any time soon,” Gray quips, “but certainly applications like this have unusual needs, and to the extent we can see ways of meeting those needs, we can improve our products.”
— Bernadette Toner