Skip to main content
Premium Trial:

Request an Annual Quote

NCI Team Recommends Two-Platform Design for New Genome-Wide Association Studies


A team at the US National Cancer Institute has developed a "happy compromise" for researchers looking to do array-based genome-wide association studies using high-density chips but who cannot afford to genotype the thousands of cohorts required to reach statistically relevant findings.

Rather than running whole-genome genotyping arrays containing common variants on a large number of samples and then imputing the results using publicly available datasets, the NCI research team recommends genotyping all of the study samples using these inexpensive arrays, and then analyzing a subset of the samples with one of two approaches: more expensive higher-density arrays containing rare variants or next-generation sequencing.

The team then calls for imputing the missing genotypes for those participants not genotyped on the denser platform and performing the association study on the augmented dataset. "Instead of depending only on a public dataset, the imputation reference set now includes a genotyped subset of the study population," they noted in a paper outlining the approach, published this month in Genetic Epidemiology.

Lead author Joshua Sampson told BioArray News that as genotyping approaches improve, investigators face a number of questions. "If the study has already genotyped a cohort on an older platform, they have to decide whether it's worth re-genotyping their population using the improved technology," Sampson said. On the other hand, "if a study is about to genotype a cohort, they have to decide whether using the bigger, better, and more inclusive genotyping platform is worth the additional cost," he said.

"Our recent paper attempts to show that the choices need not be so black and white," he added. "Imputation allows a happy compromise in our two-platform design."

Sampson is a biostatistician at the division of cancer epidemiology and genetics at NCI in Rockville, Md. Other authors on the paper include Kevin Jacobs, Zhaoming Wang, Meredith Yeager, Stephen Chanock, and Nilan Chatterjee. Chanock, who is chief of the laboratory of translational genomics at NCI, discussed association studies in an interview with BioArray News last year (BAN 3/1/2011).

Affymetrix and Illumina continue to develop high-density chips containing rare variant content for association studies. For instance, Illumina last year launched the Omni5, which contains nearly 5 million markers. The arrays were largely designed using rare variant content from the 1000 Genomes Project and other sources to detect uncommon susceptibility SNPs with minor allele frequencies of between 1 percent and 10 percent. Sampson and his colleagues set out to reduce the cost of such studies by avoiding genotyping a large number of participants with expensive technologies

In the paper, the NCI team argued that using high-density arrays in combination with next-generation sequencing in large association studies is "prohibitively expensive" for most researchers. The "more economical alternative" is to use less-dense arrays to genotype the study samples and then rely on an imputation procedure trained on a publicly available database to estimate the missing genotypes. But, as the authors warn in the paper, "if the ancestry of the study population is not adequately represented in the database, the imputation accuracy for uncommon SNPs can be less than ideal and confound study results."

Their compromise method calls for using a standard genotyping array to genotype all of the study samples, and then supplementing that data by genotyping only a small proportion of the participants on a platform that has higher coverage for uncommon SNPs. This subset of the study population is then included as part of the imputation reference set.

In the paper, the team evaluated the potential efficiency of the two-platform approach using a dataset containing 756 individuals genotyped on both the Illumina HumanOmniExpress and Omni2.5-Quad, which contain roughly 900,000 and 2.5 million markers, respectively.

While the authors acknowledged that genotyping all individuals on a denser array "would be ideal," they found that genotyping only 100 individuals on the array, in combination with imputation, leads to "only a modest loss of power for detecting associations."

More specifically, they argue that it could be possible to observe more than 80 percent of the detectable associations with as few as 100 subjects genotyped on the higher-density chip, an increase of between 5 percent and 10 percent over the percentage possible when basing imputation only on a public reference set.

At the same time, they noted that that if the relative risks for rare variants are significantly larger than those previously observed for common variants, then the proportion detected would likely be lower, concluding that "this same evidence cautions against depending on imputation if rare variants are found to have large relative risks."

According to Sampson, one could genotype "only a small fraction, perhaps just 1 percent of a cohort, on the bigger platform" as part of the two-platform approach. Then the remainder of the cohort could be genotyped on the lower-density platform with imputation used to fill in the difference.

"The key point is that the small fraction of the cohort genotyped on the larger platform allows the imputation model to be trained on one's own cohort," Sampson said. "This guarantees that the training set includes adequate representation for the desired population," he said. "By genotyping just a hundred individuals on that larger platform, study power can be increased by [between] 5 [percent] and 10 percent, as compared to when only a public reference dataset is available."

The two-platform design is appropriate "whenever two different genotyping methods are available with one method being more inclusive, but more expensive," the authors wrote in the paper. They also noted that while the analysis was presented on the OmniExpress and Omni2.5, the results could be "generalized to other genotyping platforms and eventually next-generation sequencing studies once the quality of calling algorithms has stabilized."

The Scan

Germline-Targeting HIV Vaccine Shows Promise in Phase I Trial

A National Institutes of Health-led team reports in Science that a broadly neutralizing antibody HIV vaccine induced bnAb precursors in 97 percent of those given the vaccine.

Study Uncovers Genetic Mutation in Childhood Glaucoma

A study in the Journal of Clinical Investigation ties a heterozygous missense variant in thrombospondin 1 to childhood glaucoma.

Gene Co-Expression Database for Humans, Model Organisms Gets Update

GeneFriends has been updated to include gene and transcript co-expression networks based on RNA-seq data from 46,475 human and 34,322 mouse samples, a new paper in Nucleic Acids Research says.

New Study Investigates Genomics of Fanconi Anemia Repair Pathway in Cancer

A Rockefeller University team reports in Nature that FA repair deficiency leads to structural variants that can contribute to genomic instability.