The use of Bayesian statistics for analyzing gene expression data is old hat in the bioinformatics world, but a Cambridge, UK-based startup is applying this modeling method in a new way with hopes of improving the reproducibility of microarray data.
BlueGnome, a bioinformatics firm spun out of the University of Cambridge signal-processing lab just over two years ago, is not just another microarray software firm, according to CEO and co-founder Nick Haan. The difference, he said, is the stage of the analysis process that the company has defined as its sweet spot. Unlike many bioinformatics companies developing new statistical methods for classifying large gene expression data sets, BlueGnome is setting its sights a bit further upstream, at the image analysis step.
BlueGnome claims that its software offers a number of advantages over other approaches, including improved data accuracy and reproducibility, a well as a substantial time savings in manual QC. So far, Haan said, research groups at Unilver, Sygen, and Oxford and Cambridge Universities are using BlueGnome’s software in their microarray analysis pipelines, and business is picking up. The eight-person firm is “hiring quite rapidly now,” Haan said.
“People have been applying very advanced Bayesian statistics further down the line for a number of years now,” Haan told BioInform. “But we’re one of the first groups to take that advanced method and bring it right back to the image analysis stage.”
BlueGnome’s first product, BlueFuse, was released in March. The software was designed to statistically model each stage of the microarray analysis pipeline — from sample preparation to hybridization to fluorescence, to image capture — based on a range of experimental parameters. BlueFuse creates two separate sets of models for each experimental step — one set represents “what we should expect to find in the data,” Haan said, while the second set models the range of things that can go wrong with each of those processes to introduce noise.
“With those two sets of models — one for the idea data and one for the noise data — we can actually start to separate signal from noise more accurately in the output data,” Haan said. BlueFuse uses both data sets to statistically determine where a particular image falls in the continuum between “ideal” and “noise.” In the case of a dust speck on a slide, for example, “our software can say whether this image actually matches the profile of a dust speck far better than it matches the profile of ideal data. And we can make an automated decision about whether it’s a dust speck or ideal data — and if it is ideal data, how good it is.”
This ability to generate a confidence score that ranks the quality of the data generated by a microarray experiment is a key feature of the software, Haan said. “At the moment, people just have these huge databases of numbers, and you really have no idea which number is good, which number is bad, or how much you should trust one data point as opposed to another. And what we’re able to do is to give people a rigorous basis on which to make those decisions.”
BlueGnome is not alone in focusing its software development efforts on microarray image analysis. Other companies, such as BioDiscovery and Genepix, are well established in this market, while others, such as ViaLogy, are bringing new signal processing tools to bear on the field. But Haan said that current methods rely too heavily on shape recognition and other visual approaches, rather than on rigorous statistics that take each stage of the analysis pipeline into account. “You don’t have a basis to [decide what an image is] accurately without a real understanding of the various processes that could generate the data — for example, what causes dust specks, what causes noise,” he said.
But the company doesn’t want to be pigeonholed in the microarray market. Haan said that the Bayesian method that BlueGnome developed is “generically applicable” across a range of high-throughput life science platforms that rely on image analysis. So far, the company has developed prototype software for analyzing NMR data and high-throughput screening data. Longer term, Haan said, the company would eventually like to “combine the information that comes off those different experimental platforms.”
But for the near term, the company’s goals are clear: “The short-term goal is to really accelerate the sales of our first product, which is the microarray product, and to obtain a significant foothold in that market over the next year,” Haan said.