Skip to main content
Premium Trial:

Request an Annual Quote

Insightful, Genedata Latest to Adapt Microarray Software Platforms to MS Biomarker Analysis


Researchers working on mass spectrometry-based biomarker analysis will soon be able to choose from a range of new software developed by firms that may be more familiar to their colleagues in the microarray lab.

This week, Genedata released a new version of its Expressionist platform that adds biomarker discovery to the software's existing gene expression analysis capabilities. In addition, the National Cancer Institute awarded statistical software firm Insightful two contracts worth around $186,000 to develop biomarker-discovery tools that will be based on the company's S-Plus 7 statistical software and S+ArrayAnalyzer microarray-analysis platform.

These announcements follow on the heels of Agilent's disclosure last week that it is developing a version of its GeneSpring microarray analysis platform for proteomics biomarker analysis that it expects to launch in a few months [BioInform 02-03-06].

The three firms will join Rosetta Biosoftware, known for its Rosetta Resolver gene expression-analysis software, which last June launched its Elucidator biomarker analysis package. These companies all face a number of entrenched competitors in the biomarker analysis market, including mass spec instrument vendors and other companies in the proteomics market (see table, below).

"We're actually finding people turning up who do want to map their proteomics data through to some sort of transcriptomic framework. And that's really nice, because in the beginning we had to think to ourselves, 'Well, obviously there are a few people thinking about this, but is it really going to catch on?'"

Why such interest in biomarkers from the gene expression sector? According to Genedata and Insightful, microarray analysis software serves as a logical jumping-off point for proteomics biomarker discovery.

Biomarker analysis is "just one of the things that naturally falls out of our workflow, and our workflow is one that we've obviously developed for Affymetrix GeneChip analysis," Tobe Freeman, manager of public relations at Genedata, told BioInform. The key, he said, "is the establishment of what you'd call a data matrix — a raw file of microarray data, or a chromatogram, in the case of MS data — and we've fitted that into the framework that we know best."

Michael O'Connell, director of life science solutions at Insightful, nearly echoed Freeman's words. "The way we look at proteomics data, it's a different data container at the front end of the workflow, and there are different issues around normalization, pre-processing, getting rid of noise, extracting signal — there are different issues involved in that than in microarrays, so it's a fundamentally different data container," he said. "But after you've done the cleanup, the pre-processing piece, you essentially come up with the same data matrix for the downstream processing as you have with microarrays."

Likewise, Rosetta's Elucidator "is built upon the Resolver system platform and includes most of the downstream analysis capabilities," Bill Kaufman, a Rosetta spokesman, wrote to BioInform via e-mail. "Given the stability of that platform, we were able to focus much of the development work on the protein-specific components of the Elucidator system."

Kaufman was unable to provide details on the level of adoption that the company has seen since launching Elucidator six months ago, but he said that Rosetta is seeing "increasing demand" for the system from pharmaceutical, biotech, and academic institutions.

Insightful: Rigorous Pre-Processing

For Insightful, which does not yet have any concrete commercialization plans for its software, the key development focus will be on the pre-processing half of the equation.

O'Connell cited early papers in the field of proteomics biomarker discovery that were later criticized for a lack of statistical rigor in the pre-processing stage. "Because they didn't do a good job on the pre-processing, they pulled up markers that were really just systematic experimental design issues rather than true markers," he said. "So we've embraced that rigor in our approach to this, and we're spending a substantial amount of time on the pre-processing side."

While the project is still in its early stages, O'Connell said that Insightful is exploring the use of the wavelets module in S-Plus for this aspect of the project.

On the biomarker-discovery side, Insightful has a number of machine-learning and biostatistical tools in S-Plus and ArrayAnalyzer that should be able to handle the processed proteomic data, he said. The company is looking into both feature-elimination and feature-addition approaches to identifying sets of biomarkers, and is also exploring the use of ensemble classifiers, according to Jill Goldschneider, research director of Insightful.

"That's what the scientists are working on," Goldschneider said, adding that "what ends up in product is a totally different story."

Golschneider said that one challenge the company is facing is that "the technologies keep evolving, the data sets keep growing, and we need to come up with configurable technologies that can adapt to the changes."

Nevertheless, Insightful is confident that it can get a product on the market at some point. "The fact that we have the ArrayAnalyzer tools there for downstream processing makes that more tangible," O'Connell said. "If we can get the pre-processing part solved, then we do have some downstream tools for plugging that together."

O'Connell said that the market opportunity for software in this area is still difficult to pinpoint, but "there's definitely interest," he said. "Drug companies that we're talking to have made significant investments in their proteomics platforms over the last couple of years, and we certainly have had requests for early versions of our code."

In addition, O'Connell cited the US Food and Drug Administration's Critical Path initiative as a driver for biomarker research in the pharmaceutical industry.

If O'Connell's assumptions are correct, the biomarker field may be getting a bit of a boost from the FDA, which announced this week that of the $1.95 billion President Bush allotted for the agency in his proposed fiscal 2007 budget, it intends to spend $5.9 million on the Critical Path plan — the first time the program will receive formal federal funding since it was created in March 2004.

The agency said in a statement that as part of its Critical Path program, it intends to deliver to industry "concept papers and draft guidances for industry on the framework for qualifying new safety and efficacy biomarkers for drug development, such as genomic and proteomic assays."

Genedata: Memory Caching for Raw Data

Like Insightful, Genedata has not pinned a number on the market for its new software, but "we're pretty confident that there are people out there who are thinking in terms of the biomarker framework," Freman said. "They're thinking of drug discovery in terms of this measurement-based biomarker framework."

Genedata presented a poster on Expressionist's ability to identify mass spec biomarkers at last August's HUPO meeting, and this week's release represents a productized version of that research project, Freeman said.

Freeman acknowledged that when the firm first began developing what it's calling the "cross-omics data analysis" capability, it was targeting "a few early adopters or dreamers within the pharmaceutical industry." Now, he said, "We're actually finding people turning up who do want to map their proteomics data through to some sort of transcriptomic framework. And that's really nice, because in the beginning we had to think to ourselves, 'Well, obviously there are a few people thinking about this, but is it really going to catch on?'"

Freeman said that the Basel, Switzerland-based software firm will compete in the quickly crowding biomarker software market on the strength of Expressionist's "multi-level memory caching system," which allows the software to process hundreds of mass spec chromatograms simultaneously. "No one else as far as I know has the ability to process hundreds of raw chromatograms at one time," he said.

The ability to process raw chromatograms is crucial, because "if you don't have access to that chromatogram, then you can't check whether all of the analysis that identifies proteins has really worked or not," Freeman added.

— Bernadette Toner ([email protected])

Companies Marketing or Developing MS Biomarker Analysis Software
Company Product Name
Release Date
Microarray Software Vendors
Agilent Technologies GeneSpring MS
First half
of 2006
Version of GeneSpring expression analysis software targeted to mass spec biomarker analysis
Rosetta Biosoftware Elucidator
Includes raw data management, LC/MS data processing for quantitative and differential analysis, protein identification, and high-level analysis at the peptide and protein level.
Genedata Expressionist Pro 3.0
Adds proteomics — and metabolomics-based biomarker analysis to Expressionist's expression analysis capabilities.
Insightful N/A
R&D-stage project to develop biomarker analysis tool based on S-Plus and S+ArrayAnalyzer software tools
Proteomics Software Vendors
Nonlinear Dynamics Progenesis PG600
Analyzes trace data, visualizes spectra, and includes several hierarchical analysis methods including neighbor joining and UPGMA clustering to help identify groups of samples
Mass Spec Instrument Vendors
Applied Biosystems MarkerView
Profiles both proteomic and metabolomic biomarkers
Bruker Daltonics ClinProTools
Version 2.0, released June 2005, includes a genetic algorithm called QuickClassifier to speed classification results for large data sets.
Ciphergen Biomarker Patterns Software
April 2004 (version 5.0)
Package for supervised classification of SELDI mass spectral data sets from Ciphergen's ProteinChip platform
PerkinElmer BioXpression
Service based on "several software offerings"
Thermo Electron N/A
First half
of 2006
Correlogic Proteome Quest
Built upon the company's Knowledge Discovery Engine pattern recognition software, which iteratively processes more than 15,000 candidate biomarkers until it finds a set that optimally segments diseased from healthy samples
Predictive Diagnostics BAMF (Biomarker Amplification Filter)
Available via services and collaborative model

Filed under

The Scan

Unique Germline Variants Found Among Black Prostate Cancer Patients

Through an exome sequencing study appearing in JCO Precision Oncology, researchers have found unique pathogenic or likely pathogenic variants within a cohort of Black prostate cancer patients.

Analysis of Endogenous Parvoviral Elements Found Within Animal Genomes

Researchers at PLOS Biology have examined the coevolution of endogenous parvoviral elements and animal genomes to gain insight into using the viruses as gene therapy vectors.

Saliva Testing Can Reveal Mosaic CNVs Important in Intellectual Disability

An Australian team has compared the yield of chromosomal microarray testing of both blood and saliva samples for syndromic intellectual disability in the European Journal of Human Genetics.

Octopus Brain Complexity Linked to MicroRNA Expansions

Investigators saw microRNA gene expansions coinciding with complex brains when they analyzed certain cephalopod transcriptomes, as they report in Science Advances.