Looking to explain the discordance sometimes observed in the results of independent genome-wide association studies, a team of researchers from multiple government, academic, and corporate entities recently assessed the ability of laboratories to obtain similar genotypes using different microarray platforms.
The research team reported its findings in a paper published this month in PLoS One, noting foremost the high concordance of genotypes obtained using the Affymetrix and Illumina microarray platforms.
The researchers also found that while arrays with low-quality data were detected when comparing genotyping data from technical replicates, the same low-quality data could not be detected using vendors' quality control suggestions — indicating "the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results."
Taking their finding a step further, the authors simulated the impact of discordant genotypes on association analysis results, and suggested that discordant genotypes caused by low-quality array data could "explain, at least in part, the irreproducibility of some GWAS findings when the effect size and the minor allele frequencies are low."
"Most genotyping technologies are reproducible if the genotyping experiments are conducted properly," corresponding author Huixiao Hong told BioArray News this week. At the same time, Hong said that technical replicates "help the quality control of genotyping data" and therefore could "improve the reliability of GWAS findings."
Hong is a researcher in the division of bioinformatics and biostatistics at the US Food and Drug Administration's National Center for Toxicological Research in Jefferson, Ark., the same center that has helped coordinate the multiple phases of the Microarray Quality Control project, or MAQC.
Created in 2005, the MAQC project has in the past evaluated the reproducibility of expression microarray experiments across different labs and platforms, investigated sources of bias in array-based studies, and, most recently, focused on assessing the technical performance of next-generation sequencing platforms (BAN 8/3/2010).
Some of the authors of the PLoS One paper have played significant roles in the MAQC project, among them NCTR researchers Leming Shi and Weida Tong. When asked if the new paper was an outcome of the MAQC project, Hong answered "yes and no," stating that it evolved out of the MAQC collaboration, but is also the result of a separate genotyping project.
"At the early stage of MAQC, we focused on DNA microarray technology as the FDA received many gene expression data sets," said Hong. "By 2008, more and more pharmacogenetics data sets including GWAS data were submitted to the FDA, so … hand-in-hand with the MAQC project, we initiated a project to address issues in GWAS," he said.
In the study, the researchers sought to assess the inter-laboratory and inter-platform reproducibility of genotypes by comparing genotyping results of four technical replicates for six subjects across two different SNP arrays, Affy's SNP 6.0 Array and Illumina's Infinium Human1M-Duo BeadChip in five different laboratories.
According to Hong, team members held multiple teleconferences and face-to-face meetings "to discuss the study design, technical issues, and strategies for data analysis." The samples were obtained and prepared at NCTR, randomized, and sent to the genotyping sites with the reagents. Expression Analysis, Northwestern University, the Samsung Advanced Institute of Technology, Beckman Coulter Genomics, and the Center for Molecular Medicine performed the genotyping, Hong said. After that, the raw data was sent back to NCTR for distribution to various data-analysis teams. Following the analysis, the results were sent back to NCTR for discussion.
The results demonstrated high concordance between the Affy and Illumina arrays. According to the paper, the researchers found genotype concordance of between 99.40 percent and 99.87 percent within a laboratory for the same platform, 98.59 percent and 99.86 percent across laboratories for the same platform, and 98.80 percent across genotyping platforms.
However, the researchers found that vendors' QC suggestions "might not be sufficient to assure data of adequate quality." Specifically, they found that while some array data met the QC criteria according to Affy's guidelines, it had a higher heterozygous rate than corresponding subject replicates, leading the researchers to deem the data quality to be too low and exclude the data from the study. Noting that most genetic markers identified in GWAS confer very small relative risks, the authors stated that a very small error in genotypes could be inflated in GWAS and might generate false associations.
Further exploring the impact of such genotyping errors, the researchers simulated the study results with a control population of 3,000 samples and a case population of 2,000 samples.
They found that the smaller the minor allele frequency and the lower the concordance in genotypes, the larger the spurious odds ratio. This led the authors to conclude in the paper that a "very small discordance in genotypes caused in genotyping could change odds ratios of genetic markers and affect the final conclusions of GWAS."
According to Hong, the findings have ramifications for researchers in that replication is often not included in GWAS protocols, "raising the risk of inducing errors in different steps of a complicated process." Using more technical replicates in studies could enable researchers to "detect, prevent, and eradicate sources of technical errors and biases in genotyping," he said, and, ultimately," improve the quality of genotype data and confidence in GWAS results."