Affymetrix and a few of its friends at Johns Hopkins have reinvented the abacus. But instead of sliding wooden beads in a frame, their ABACUS, or Adaptive Background Genotype Calling Scheme, is an automated algorithm that they claim allows researchers to reliably detect thousands of rare and common genetic variations at once.
ABACUS works with the Variation Detection Array, a new type of GeneChip that allows detection of single nucleotide polymorphisms at over 37,000 different sites on the genome.
“We are really just resequencing about 30 kb at a time,” said Johns Hopkins researcher David Cutler. “People have tried to use Affymetrix chips to do this in the past, but it was perceived that you often times got inaccurate answers off the chips.”
The ABACUS is designed to weed out those inaccurate answers, leaving only the “base calls” (determination of the specific base present at a specific site in a sample) that are highly reliable.
Affymetrix plans to commercialize both the Variation Detection Array (VDA) and ABACUS in the near future, said company spokesperson Anne Bowdidge. “This is a next generation microarray,” said Bowdidge. “It is our intent in the future to export these arrays broadly to the research community for polymorphism discovery.”
But in keeping with the company’s new policy of scientific perestroika, — which it most recently demonstrated in announcing plans to reveal the sequences on its oligos within the next few months — scientists from Affymetrix and Johns Hopkins’ McKusick-Nathans Institute of Genetic Medicine have already published a paper on this algorithm and the VDA array in the November issue of Genome Research.
In this paper, “High Throughput Variation Detection and Genotyping,” the researchers described how they worked with the inherent features of the VDA, including the pixelated spots and the fluorescence, to develop a method for separating the microarray wheat from the chaff.
Affy’s Next Big Thing?
The VDA, which is almost as densely populated with oligos as an ordinary GeneChip, includes about 300,000 features. Each feature, a 20 x 24 micron rectangle, contains a volume of about one million identical 25 base-pair oligonucleotide probes for a specific sequence. Features are arranged in groups of four, each of which contains a volume of oligonucleotide probes that differs from the others on the adjacent three squares at the 13th base. Having this grouping of four enables the array to detect all four variants (A, G, C, and T) of a specific sequence. There are two sets of four different features for each variation on the array, each corresponding to the variants on the forward and reverse strands of the DNA — for a total of about 37,500 different sites.
Most of the time the correct sequence hybridizes to the probe sequence. But sometimes, cross-hybridization occurs, wherein a sequence similar to the one the probe is designed to detect will partially bind to the probe. Sequences can also bind to the chip surface, creating the problem of “background noise.” Another problem common to Affymetrix chips is feature saturation, where an especially strong signal will saturate all of the pixels of a feature, creating problems with statistical variances.
To account for these issues, the researchers designed mathematical models for the hypothetical “perfect fit” null hypothesis (no hybrid-ization), homozygote (hybridization to the same variant base on both forward and reverse strands) and heterozygote (hybridization to different bases on different strands). These models included formulas for the perfect mean background and variance.
Based on these models, the ABACUS system determines quality scores for each set of eight features in a VDA. The score is a measure of the perfect fit quality score minus the best fit score for the spot, or the difference of the logarithms of the scores, with the reverse strands. A spot is said to be called when the quality score for one model;, i.e. null, homozygote, heterozygote, is said to fit significantly better than the others using statistical measures. If none fits significantly better than the others, then the probe set is thrown out as unreliable.
This model also includes a mechanism for correcting against uneven background, a problem that is especially common with heterozygotes. Additionally it sets standards for determining PCR failure and for eliminating bands of probes that may have not hybridized well, and doublets of SNPs.
Calling All Unreliables
The authors tested the approach in an experiment encompassing 32 autosomal and eight X-linked genomic regions, each consisting of approximately 50 kb of unique sequence spanning a 100-kb region, in 40 humans.
In total, the researchers claimed that they were able to identify about 80 percent of the genotype variations with near perfect accuracy. In repeated experiments, they identified 800,000 genotypes identically, validating the accuracy of their calls.
“These results indicate that microarrays can be used for both detection and genotyping of variation simultaneously, and the accuracy of the genotyping approaches or exceeds most other widely available standalone genotyping technologies,” the authors wrote.
But they also acknowledged that the ABACUS procedure, in eliminating so many unreliable calls, left about 13 percent of diploids and 7 percent of haploids uncalled for.
Since authoring the Genome Research article, however, the Hopkins researchers have mostly solved this problem in diploid detection by fixing a step in the array imaging process, Cutler said.
In the Affymetrix imaging software, the user has to fit a grid onto the image that comes through. This step introduces the possibility for human error. “The principal reason the diploids were harder to call was because of very small grid problems,” Cutler said.
Through developing computer code to automate this “gridding” step, the Hopkins researchers have been able to reduce the number of failures in the diploid samples. “We’ve gotten to the point where we never use the Affymetrix software for anything,” Cutler said.
Even with this software improvement, the researchers still lose about five percent of the bases, especially in guanine-rich sequences. But Cutler insisted that this technology is ready for prime time. “Just give me a human and let me sequence them,” he said.