Affymetrix recently released a new version of its Genotyping Console software, GTC 3.0, which includes a new algorithm for predicting genotyping performance called Contrast Quality Control.
The metric replaces Affy’s previous QC algorithm, Dynamic Model QC, which was found to be “of limited value for predicting genotyping performance” in certain “problematic” data sets, according to a white paper that Affy released last week.
After tuning the metric for several months, validating it, and setting thresholds and guidelines for best practices to apply it, Affy found the software “to be a better predictor of a sample’s genotyping performance than the DM QC call rate, especially in some rare problematic datasets,” the firm’s principal biostatistician Teresa Webster explained to BioInform in an e-mail interview.
Contrast QC appears to address challenges some users experienced with the DM QC metric.
“Before Contrast QC was implemented, we were finding that we consistently had to throw out about 5 percent of our data that was passing the Affymetrix quality-control measures because it wasn’t passing our own internal standards,” Jennifer Troyer, a researcher at the National Cancer Institute-Frederick’s Laboratory of Genomic Diversity, told BioInform via e-mail.
CQC is “just a better metric to determine the quality of each sample,” she said, adding that so far it has improved quality control in her lab.
Webster explained that CQC is a “complementary method used to qualify experiments as being of sufficient quality to pass through to genotyping algorithms,” such as Birdseed, the default genotyping algorithm for the SNP Array 6.0 platform that was developed by scientists at the Broad Institute of Harvard and MIT.
Contrast QC assesses the data quality from an individual array and predicts its performance in multi-array genotyping “cluster-based” analysis methods, which generate clusters based on the spot intensity variances within the dataset for each SNP.
According to the white paper, the DM algorithm “did not measure the degree to which the [perfect match] intensities of all SNPs cluster by genotype.” In high-quality samples, “the A and B allele probe intensities will display three clusters for the AA, AB, and BB genotypes of the sample,” but in poor-quality samples, these clusters will merge, the paper notes.
Concluding that “cluster resolution is a better predictor of a sample’s genotyping performance” from cluster-based genotype-calling algorithms, Affy developed CQC in order to “better track” genotyping call rates produced by Birdseed.
Webster’s group compared the ability of CQC and DM QC call rate to predict Birdseed’s performance using more than 6,000 samples representing more than 50 datasets. “These datasets utilized both good and poor quality assay runs, as well as HapMap and customer sample generated data,” she said.
Webster said that in these studies, her group found a “good linear relationship” with a correlation coefficient of approximately 0.75 between CQC and genotyping performance.
“The important point is that we find essentially the same linear relationship across the disparate datasets,” she said. “This means that points for Contrast QC versus genotyping performance lie along essentially the same line. So even though [it is] not perfect, we can draw universal thresholds for acceptable performance, based on the common line.”
In the routine analysis of customer-generated data, Affymetrix scientists found that the correlation coefficient between Contrast QC and Birdseed v2 call rate in single data sets “frequently exceeds 0.90,” Webster said.
Webster said that cluster resolution is “a fundamental property for good genotyping performance.” The connection between Contrast QC and genotyping performance should apply to other genotype clustering algorithms such as BRLMM, the Bayesian Robust Linear Model with Mahalanobis Distance Classifier, she said.
CQC will not work with copy number-discovery algorithms, such as Birdseye and Canary, however. “Contrast QC is not designed to predict copy number performance,” she said, adding that Affy has developed a different metric, called MAPD, for that purpose.
For scientists studying the white paper and wondering how to extrapolate from the Affymetrix validation to their own work, she said that “users can easily verify the prediction value in their own lab by clustering a data set of samples with Birdseed, and plotting Contrast QC versus logit (Birdseed Callrate). The linear relationship will then be clear,” Webster said.
Room for Improvement
Several research groups have indicated that Affy’s previous QC metric needed improvement. For example, researchers at the University of Tokyo recently found that tweaking the criteria for DM QC led to much better results than using the default version of the metric.
Nao Nishida, a researcer at the University of Tokyo and first author on a paper describing the project in BMC Genomics, told Bioinform in an e-mail that the team decided to use more stringent criteria than the ones recommended by Affymetrix and found that the method reduced the number of false positive results.
In the paper, the researchers evaluated the performance of the SNP Array 6.0 platform with Birdseed v1 and DM QC for genome-wide association studies in a Japanese population.
“Before Contrast QC was implemented, we were finding that we consistently had to throw out about 5 percent of our data that was passing the Affymetrix quality control measures because it wasn’t passing our own internal standards.” |
They found that the average overall call rate “gradually decreased as the sample number increased, presumably due to low-quality samples included in the genotype calling with the Birdseed algorithm.”
The scientists pointed out that the recommended assay criteria to exclude low-quality samples had not always done the trick. “We empirically know that some samples, which pass these criteria, have low-quality genotyping results,” they wrote.
Nishida said that modifying the parameters of the DM QC algorithm produced better results. Using the default metric, “we found 57 SNPs … that were revealed to be false associations,” she said. “Alternatively, when we used the 184 control samples which passed the stringent QC call rate criteria, 72 percent of the false positive associations were removed.”
Nishida said that the University of Tokyo team has begun using GTC 3.0, which contains Birdseed v2 and Contrast QC, and has just “started to evaluate the performance” of those methods.
NCI-Frederick’s Troyer has evaluated Contrast QC and determined that it is an improvement over the prior metric.
“While the cutoff for good versus bad is somewhat arbitrary and is empirically determined, we have found that the new Contrast QC correlates better with our downstream measures of sample quality [in terms of] call rate and heterozygosity,” she said.
DM QC measured the reproducibility of redundant probes for a subset of approximately 3,000 markers to predict the quality of the entire probe set of 906,600 SNPs, she said. “Because the QC markers weren’t randomly chosen and didn’t use the same probe design as the rest of the probes on the chip, they weren’t necessarily the best measure of overall success,” she said.
“From what I understand, [the new metric] is measuring the intensity difference, or contrast, between the genotyping signals at 10,000 randomly chosen markers across the chip,” she said. “This seems to be a much better predictor of overall calling success.”
With the current metric, there is only about a 1-percent failure rate, as opposed to 5 percent with DM QC. “On a practical note, this saves us time and money because Affymetrix will only provide replacement chips for ones that fail their own QC metric, so we were eating the cost of the additional failures,” she said.
Another Approach
Rafael Irizarry, a biostatistician at Johns Hopkins University who is familiar with the Affy platform, did not comment directly on the new metric but praised the “good group of statisticians” at the company and noted that “we have a QC metric that we think works very well.”
Irizarry’s metric is a signal-to-noise ratio that “captures the difference of the intensity values among the genotype clusters across all SNPs of a given chip,” according to a recent paper he and his colleagues published in Genome Biology.
The paper describes Corrected Robust Linear Model with Maximum Likelihood Classification, or CRLMM, which the authors claim to be more accurate than BRLMM or Birdseed. One aspect of the method is the signal-to-noise ratio QC metric, which captures the difference of the intensity values among the genotype clusters across all SNPs of a given chip and “is an excellent predictor of chip-specific accuracy,” they wrote.
Irizarry noted that SNP microarrays for genome-wide association studies face a number of issues. For instance, genotyping errors can be due to a bad array, bad probes for specific SNPs, or a measurement error on one probe on a single array.
“It is important to detect and down-weigh unreliable calls from the analysis pipeline,” he said. “Bad arrays or bad hybridizations are one way in which calls can become unreliable.” In his view, algorithms need to be able to identify these unreliable calls and “treat them appropriately.”
“A very bad array may deliver zero information to scientists in an experiment,” he said. “But if you don’t know that, you might think you have some accurate [heterozygote] calls.”
While genotyping algorithms have evolved, they continue to be “prone to drop heterozygous calls,” he said.
CRLMM is available through the oligo package in Bioconductor.
Finding Quality
The issue raised in the Irizarry paper “is one of genotype calling for each SNP and the effect that even small mistakes in frequency calls can have on downstream association analyses,” Troyer said.
With a million SNPs, even with a 99-percent call rate for an individual, there would be 10,000 SNPs that didn’t for work a given sample, she explained, adding that if those same SNPs are problematic for multiple samples, they can potentially lead to false positive signals.
“The only way to address this problem is to improve clustering algorithms and measures of SNP confidence,” she said. While “any clustering algorithm will work better with a pristine data set,” Troyer said that Contrast QC does a better job than DM QC at reducing the amount of bad data included in the study.
Webster said that Contrast QC is essentially the “one-dimensional analogue” of the signal-to-noise ratio developed by Irizarry and his colleagues. “Both metrics are measuring the same property — which is the overall separation of the A and B signals into genotype clusters, using [expectation maximum] fitting,” she said.
QC metrics are important for genotyping analysis because they exclude low-quality samples prior to clustering, but beyond this role, Contrast QC and Irizarry’s metric have no direct relationship to genotype clustering performance, Webster said.
“In general, algorithmic performance is a moving target and we are glad to see high-quality genotyping becoming optimized over time as new software options become available to our customers,” she said.