In its new HG-U133 human genome array set, Affymetrix has reduced the entire genome to two chips. These chips contain about 45,000 probe sets for 39,000 unique transcripts and 33,000 genes, compared to 60,000 probe sets on the previous five-chip U95 set. Do the math, and you realize that the high priest of high density arrays has nearly doubled the data available per chip, to 22,500 probe sets from 12,000.
The HG-U133 “is now the highest density oligo array available, with over 1.2 million individual oligo features per chip,” said Affymetrix marketing manager Elizabeth Kerr. “No one else can touch that.”
If this chip delivers consistent robust results in the lab, Affymetrix’s competitors may not be able to touch the price either. The company is pricing these chips at roughly the same amount per chip as the U95 set. In other words, a researcher will be able to buy the two-chip set representing the entire human genome for less than half the price of a 5-chip U95 set. The academic discounts will also apply, as will other pricing programs, said Kerr. Given the fact that Affymetrix chips have gone to academics for as little as $350 per chip, this means that an academic researcher could now buy the entire human genome set for about $700. Bulk discounts for large pharma customers could also substantially lower costs for microarrays — although Affymetrix is naturally expecting that this lower price will just cause researchers to do more experiments.
Further seducing researchers, the company has chosen to fully reveal its probe sequences for this new chip, along with the sequences of probes on all of the other chips. These sequences will be available through the company’s NetAffx web portal. A visual tool will illustrate the probe sequence on the U133 next to the comparable probe on the U95 — the probes are slightly different, and Affymetrix thought researchers might want to know what the differences are in comparing research done on different chips.
While the different probes on the new human array could create some short-term bioinformatics headaches for researchers already struggling with chip-to-chip normalization and comparing disparate datasets, the probe sequence disclosure will likely make this problem more manageable. The fewer number of sequences in total could also reduce analysis headaches, as the company says they more accurately reflect the human genome than the U95 arrays.
“We are very confident that these are really transcribed sequences,” said Kerr. “In the past it was hard to get that kind of evidence.”
More Information, Fewer Genes
Affymetrix has the International Human Genome Sequencing Consortium to thank for purportedly increasing the accuracy and reducing the number of probe sets on its arrays. This new set, unlike the previous one, used the draft sequence of the human genome published by the consortium in April 2001. Company scientists aligned the sequence from UniGene and other databases to the draft of the genome, and in doing so were able to weed out transcripts that were redundant or inaccurate.
“In the past whereas we might have seen sequence clusters that looked as if they were individual genes, with more information, the sequence ties together to [demonstrate] these are representing the same gene,” Kerr explained.
The company was also able to perform this higher-level genome analysis due to its acquisition of Berkeley, Calif. bioinformatics company Neomorphic in 2000. Instead of just reading sequence from UniGene (a practice that got Affymetrix in trouble with the U74 murine arrays last March) the bioinformaticians at the Berkeley campus were able to combine information in the Washington University EST database and the University of California, Santa Cruz golden path assembly of the human genome, to analyze the raw data from GenBank, dbEST, and RefSeq.
“We spent a lot of time really studying and improving the way we do sequence selection,” said Kerr.
One focus of the bioinformatics group was to trim off randomly generated ‘nonsense’ that sometimes was attached to the end of sequence reads generated through automatic sequencing methods. The company also considered alternative splicing in its sequence selection, although the chip set is not designed to exhaustively probe for splice variants.
These improved sequence selection methods also enabled the company to reduce the number of probes per set from between 16 and 20 to 11, making room for more probe sets on each array.
To help researchers normalize from chip to chip, the new set includes 100 housekeeping genes. The company selected these probes after testing them with numerous tissues and cell lines, and determining that they were consistently expressed across the board.
Affymetrix has recently introduced its new Microarray Analysis Suite version 5.0, which includes a new statistically-based algorithm. While previous arrays will work with this program, the new U133 set is designed to be used specifically with this new software, the company said.
The whole package — arrays, software, probe sequences — is ready to go right now. A few select customers, including Gene Logic, have already gotten their hands on the new arrays, and their initial reports are glowing. Just like any new research tool, however, the U133 will have to perform in the lab to demonstrate that it lives up to its heady promise.