It was only two weeks ago that Affymetrix probe sequences emerged from secrecy, yet now they seem to have lost their public inhibitions entirely: The complete probe sequences are now being integrated into two prominent data repositories, the University of California, Santa Cruz “golden path” assembly of the human genome, and the European Bioinformatics Institute’s new ArrayExpress expression database, which EBI unveiled just prior to this week’s Microarray Gene Expression Database working group (MGED) meeting in Boston.
The integration of its probe sequences into golden path and ArrayExpress comprises part of a new strategy that Affymetrix calls its “open access initiative.” Responding to a long-standing request from users to have access to the full sequence of the oligonucleotide probes on its microarrays, the company first included these probe sequences on its website, NetAffx. Now it is also working with outside informatics players to assist them in integrating this sequence data into their software.
“We recognize that standards are going to be important, and it’s important for our customers that those standards can deal with the Affymetrix data,” stated Peter Dansky, senior director of marketing, informatics at Affymetrix.
Affymetrix had already taken an active part in the MGED effort, to develop the Microarray Gene Expression (MAGE) data model, which includes the data exchange format MAGE-ML and the object model MAGE-OM.
When Affymetrix announced it was making public its probe sequences in late January, Alvis Brazma, the EBI microarray informatics team leader behind ArrayExpress, and his colleagues decided to include the oligo sequences into the database as a response to what they saw as “a positive change” in Affymetrix’s policy.
“As soon as they decided to make the oligos public, we started working towards including them in the database,” Brazma said in an e-mail to BioArray News. “In practice it will take a while to complete this as both sides have limited technical resources, but we will work hard to have at least one complete array description in the database very soon.”
These oligo sequences will include probes from the U133 human genome array set, the two-chip update on the human genome that the company released at the end of January.
Rubber Stamping the Affy Standard
The fact that these two players in public-sphere data world have accepted Affymetrix probe sequences into their database also signals that Affymetrix has been accepted as an industry standard.
Golden path was the program credited with saving the Human Genome Project from the brink of failure in its race with Celera to sequence the human genome. This assembly algorithm, which was cooked up in a flurry by Jim Kent, a graduate student in biology at UCSC just over a year ago, allowed the International Human Genome Sequencing Consortium to puzzle together the cluttered clouds of sequence information that its various appendages had generated during the course of the project.
Affymetrix used this assembly of the genome to design the probes for its most recent product, the U133 human genome chip. Now UCSC scientists are adding a “track” to the tiled layers of sequence fragments in golden path that consists of the probe sequence information, said Dansky. “If a gene is represented on an Affymetrix array, you will see exactly where the [probe sequence] is located and have the ability to link into Affymetrix’s website to get more information on it.”
As part of the UCSC initiative, the Genomics Institute of the Novartis Research Foundation will also submit expression data derived from 31 human samples and a panoply of normal and diseased tissues to the database. The expression data, which was obtained from Affymetrix arrays, will be aligned with the genomic sequence and will serve as a reference set of gene expression data.
MGED Gets Affy-ble
ArrayExpress (www.ebi.ac.uk/ arrayexpress), which EBI is launching this spring as a free, web-based data submission tool, is designed to enable the standardization of microarray data under the Minimum Information About a Microarray Experiment (MIAME) standard. The point of this standardization movement is to enable researchers to compare data from microarray experiments performed at different labs under disparate conditions.
Working groups at MGED, a group founded at the first Microarray Gene Expression Database meeting held in Cambridge, UK, in November, 1999, hammered out this standard at previous years’ conferences, and have been developing ways to disseminate it through the microarray community, one of which is through ArrayExpress. The integration of Affymetrix probe sequence data into this database tool effectively allows Affymetrix to piggyback onto the standardization movement.
“In ArrayExpress, there are different fields of information [to fill out] that fit into the MIAME standard,” said Dansky. “These fields include sequence information for target probes. So we are using the new data interchange standard to submit the probe sequence data.”
With the introduction of its probe sequences into these two large databases, Affymetrix has not only given its endorsement to these software standards, it has also gone a long way toward ensuring that any standards the scientific community adopts are centered around the Affymetrix platform.