Skip to main content
Premium Trial:

Request an Annual Quote

Q&A: NCI's Stephen Chanock on the Three Waves of Genome-Wide Association Studies


chanock.jpgBy Justin Petrone

Name: Stephen Chanock

Title: Chief, Laboratory of Translational Genomics, National Cancer Institute

Background: 2007-present, chief, Laboratory of Translational Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Md.; 2005-present, co-leader, Cancer Genetic Markers of Susceptibility Project, NCI; 2001-present, director, core genotyping facility, NCI; 1998-2006, assistant professor of pediatrics, Johns Hopkins School of Medicine, Baltimore; 1993-1998, assistant professor of pediatrics, University of the Health Sciences, Bethesda, Md.; 1991-2001, senior staff fellow, pediatric oncology branch, NCI; 1989-1991, instructor, pediatrics, Harvard Medical School

Education: 1983-1989, internship, residency, and fellowship in medicine in pediatrics, Children's Hospital Boston; 1983 — MD, Harvard Medical School, 1978 — BA, Princeton University

Do a PubMed Search for the name Stephen Chanock, and you'll come away with more than 400 hits, most of them papers about genome-wide association studies, all of them published over the past decade.

In February 2011 alone, he was listed as a co-author on 10 GWAS-related publications concerning diseases as diverse as prostate, breast, and bladder cancer.

As chief of the Laboratory of Translational Genomics at the National Cancer Institute and co-leader of NCI's Cancer Genetic Markers of Susceptibility, or CGEMS, Project, Chanock has played a role in a great number of genome-wide association studies, experience that has given him insight into how the approach will evolve in coming years.

While most array vendors are anticipating a second wave of GWAS based on the availability of higher-density and more customizable chip formats, Chanock believes that GWAS have already passed through two waves: an initial round of studies followed by a second phase of data pooling and meta analysis.

According to Chanock, investigators are now poised to pursue a third wave of GWAS powered by the next-generation of arrays, followed by next-generation sequencing or a focus on genotyping studies to validate their discoveries.

All of this will take time, though, meaning that, in Chanock's mind, there is a "long way to go" before information obtained in association studies can be applied clinically.

BioArray News last week spoke with Chanock about these and other topics pertaining to GWAS. Below is an edited transcript of that interview.

Can you give an update on your activities?

Well I am about to go to Paris for four months on sabbatical, but I will keep working. We are still very engaged in both first- and second-generation GWAS studies. We have a strong commitment to map and try to understand the biological basis of many of the regions that harbor common variants and uncommon variants that are associated with risks for different types of cancers. So we are still primarily in the discovery phase because until recently most studies weren't large enough to provide adequate statistical power to use the agnostic approach in identifying a comprehensive list of common variants. Whereas for the clinical outcomes, the size and scope of these studies has lagged behind those designed to discover etiologic risk variants and consequently will take time to get to that point. It is still a very exciting time to identify regions and characterize them. This gives us very important biological insight into how and why particular cancers or features common to several cancers contribute to the risk of these complex diseases. We know that the genetics of cancer rarely can be attributed to a single Mendelian-like mutation. Even in the highly penetrant mutations like BRCA1 and BRCA2, there are a number of important environmental and genetic modifiers that influence the risk and timing of cancer, whereas for sporadic cancers that appear in the general population, the evidence points towards complex disease paradigms, in which sets of variants and environmental factors contribute.

And you are taking part in this as the co-director of CGEMS?

Over time, the concept of CGEMS has morphed. It is still used for the registered access to individual genotype data so that registered users who can meet the criteria and willfully agree not to seek to identify the patients can pursue very well-defined projects. But that NCI initiative that began in 2005 and 2006 has really given rise to a generation of studies in the cancer world. Some that are being conducted in the intramural program, where I am, and many that are being conducted elsewhere, in the US and in Europe. Seeing the proliferation of these studies in different populations gives us an opportunity to look at factors that may be important in one environment versus another, or certain genetic aspects may be interesting in certain populations, while the differences in populations give us the opportunity to map and understand where the functional contributions reside.

[ pagebreak ]

Most vendors refer to the coming wave of studies as a second wave, but you seem to think there are more.

I actually think there are three waves, and then there is post-GWAS. I think the first wave was the first-generation studies using the chips that were commercialized and designed between 2006 and 2010. There the scope was limited to common variants, 10 percent or greater in the minor allele frequency in the general population. The first generation was small studies, though considered large at the time that had begun to gather up the low hanging fruit. The deliberate paradigm that has emerged is one of a sequence of a genome-wide scan followed by a replication effort.

The second wave is a convergence of many of these studies, now put together in meta-analyses. Whether it's looking at putting together sets of lung cancer or prostate cancer or looking at risk factors for cancer, body mass index or tobacco use, the very large meta-analyses have used these first-generation chips. I think that's really a second wave that's an outgrowth of the first wave.

The third wave, in my mind, will rely on the new generation of chips that are denser; there's more content that takes you to a lower, minor-allele frequency. If we look at what we have identified by GWASs so far, the vast majority of the hits are 10 percent or greater in the general population. The new chips are targeting one to 10 percent, but due to power considerations, which is the direct reflection of the number of samples that are available, in the near future, we expect to be less and less comprehensive in what we find as we go to the lower, minor-allele frequencies.

So I think what the vendors are calling second generation, what I am calling third generation, is going to get us to the 10- to 3-percent range. Going below 3 percent, I think we'll only find a small smattering of those, mainly because of the power considerations.

In a sense, we are just on the verge of a new effort to look at genetic variation using these new chips. The new chips are very helpful, but we have also learned a tremendous amount using the first- and second-wave data together with 1000 Genome Project as well as other sequencing data combined with HapMap to generate resources for imputation. This is really looking at predicted structures of haplotypes to estimate, with high probability the more frequent they are, that these exist in the population. So, we can assess untested variants that are being tested, so to speak, in association studies. They require a follow-up genotype to be sure that what is imputed SNP is true. We have plenty of examples where there is alarming discordance between what is imputed and what is genotyped. Sadly, many of these don't get published.

So what is post-GWAS?

Post-GWAS breaks off at any place where we conclusively find a variant at a high statistical threshold and a low probability of being a false positive, now been deemed to be of genome-wide significance. We do throw out some things that may be true in the analysis process. The flipside is we have a very small likelihood of identifying a variant with genome-wide significance that turns out to be a false positive. The genome-wide association study is really just the beginning. We now have our hits of places in the genome and it will take extensive effort on a more individual level that you can't use these large-scale technologies and just say, 'I can scan and tell you why FGFR2 or 8q24 are important for breast or prostate cancer.' No. You have to go at it the old fashioned way. You have to map that region, know every variant, and then functionally see what story you can pursue that explains why there is a direct association with one or more variants in that region. That is really the post-GWAS scenario. It is more arduous.

The extraordinary thing about GWASs is you have the intersection of the epidemiology world with collections of biospecimens in many good studies, and some not-so-good studies that have now begun to intersect with the geneticists who, as a consequence of the Human Genome Project, possess comprehensive annotation data, and then technologies that allow you to conduct massively parallel genotyping. And so you can run these genome-wide association scans on sufficiently large subjects and worry less about the epidemiological rigor for etiology studies and in return you are able to find these regions, but each region has to be prosecuted individually. There is no chip that tells you, 'Ah, here's how to explain all the regions.' To me this is the big philosophical challenge. There are nearly a thousand regions that are listed in the NHGRI catalog. There are more than 130 cancer regions. Only a handful of the cancer regions have begun to be explored and at least published as to how and why they are important for cancer development. So there is going to be a progressively larger and larger lag as we find more but are slower to explore that what we find.

How can the lessons of the first and second wave of GWAS be applied to the third wave?

The history of genetics is about thresholds. The new chips give us many more SNPs, but the consequence is you need more individuals to carry through with your replication if you haven't scanned them initially. Numbers are everything. When you have allele frequency of 2, 3, and 4 percent, unless the effect size is so strong, you have to have larger numbers of individuals. I think the lesson for the third wave is the absolute primacy of collaborations and convergence of groups that have similar or identical phenotypes to study. And the second wave has helped to do that. Look at the study of body mass index where 140,000 scanned individuals have been assembled. That's spectacular. Five years ago it would be unimaginable. Now people who were previously archenemies, at each other's throats, are working together. But this is still the tip of the iceberg. There are still a number of steps that we have to think about. And that is that when we go to genotyping or sequencing even less frequent variants, the agnostic association testing is going to either require a larger sample size, or alternatively what I think is going to happen is doing this in very special populations or family settings and gambling in the move to functional work, sooner rather than later. In genome-wide association studies waves one and two, you waited until you had really found something of genome-wide significance before you would invest substantial time in the regions. When you start doing exome-scale sequencing, you have these very promising hits, and we just don't have the numbers to carry the replication through to reach the results agnostically. We are going to have to bring function in sooner to be able to prioritize and decide how and what variants are worth pursuing.

[ pagebreak ]

For this third generation, the vendors have made their offerings more customizable. Users will be able to design more population-specific chips. What impact will this have on the coming wave of GWAS?

I think it's great. I have been one of the people banging on the vendors' doors for a long time saying that they have to be more nimble and not so monolithic in terms of what the chip content is. Still, I think there has to be a base, a scaffold in which all these takes place, so there is a common element for the common variants, but when you get to the less-common variants, you start to get into more population-specific issues. The set that may be important for individuals of African ancestry may look different compared to those who are of European ancestry or Asian ancestry. So the availability of greater flexibility is a major step forward, so long as we do not leave behind the scaffold on which these are built. Because if we go to everything being individualized and not having a set of common variants then it will be hard to come back to the interesting questions, such as how and why do certain regions influence heart disease, melanoma, and height and weight. We do have to have the backbones in place.

Where does sequencing fit in the context of GWAS?

Soon GWAS is going to morph into sequencing. As a technology, it's more efficient, both the capture and the short-segment sequencing in a massively parallel, cost efficient way. I think in the next two to three years we are going to see a dramatic shift towards more and more sequencing for both common and uncommon variants, as we already see in the search for rare alleles.

Researchers consistently cite data analysis as a bottleneck in prosecuting a GWAS. Have you seen data analysis tools improve recently?

We do research and the emphasis is on the first two letters: 'r' and 'e.' We do things over and over in order to get things more efficient and discover more and to solidify those things that we had sensed or observed but couldn't be defined conclusively before. So the analytic tools continue to improve. It's really an evolutionary process. It's very difficult to step back and say, 'I'm going to design something that is going to take us all the way to the end of understanding genetics.' We keep going through iterations, and in my mind, that's a good thing. If we had been too rigid, we would have missed a lot of the good things that have come out of GWAS; the fact that we see a lot of aneuploidy in the population, for instance. So our tools and analytics are getting better and better, but they still have a distance to go. The ultimate way would be to sequence everybody on Earth and have everybody phenotyped and have one giant Watson spew out what we understand on risks and outcomes. That's never going to happen.

There have been some critiques of the GWAS approach. Some have argued for smaller, sequencing-based studies.

Of course, with the chips available we have only interrogated a certain portion of the genetic variant space, and that can't possibly explain everything. So the Nature paper that appeared last year on the missing heritability was to be expected. There was no great surprise in that. To think that all the variants 10 percent or greater would explain diabetes or breast cancer was an example of Pollyannaism. Genetics is complicated, and there is no question that smaller studies focused on extreme outcomes are very important and are helping to fill in a complex space. GWAS is just part of it. It has discovered a tremendous number of things. Now the hard part is the post-GWAS, the characterization of what has been discovered. I do think that GWAS was oversold when it first came along, and there was this sense that we had genetic variation in our hands and we could figure everything out, when we only had a part of genetic variation. For instance, I don't think we adequately understood the scope and sheer magnitude of genetic variation. As 1000 Genomes Projects and similar efforts come online, there is continual amazement that there is a lot more variation than we ever thought- and this includes structural variation.

What is the funding environment for GWAS at the moment?

Times are getting financially tight in the US, and the research dollar is more difficult to find. That forces investigators to be more careful and thoughtful and hopefully more rigorous in how they pursue studies. When we started GWASs, we did lots of things that now, when we look back, we ask if we should have done it that way. We didn't know at the time the value of study design and power calculations in such a concrete way, but the terrain was unknown and we moved as best we could, learning and retooling at each step. Now the lessons we have learned are helpful in determining how we strategically perform the best next-generation studies. Those questions will continue to evolve, and may even be different a year and a half from now.

How long will it take before GWAS findings can be applied in a clinical setting?

I think there is still a long way to go. I am conservative in that I think we have to be very careful in how and what way we apply this information. I think there has been too fast an attempt to democratize this and believe that everyone should be tested for these variants, when in fact it's only just a portion of that genetic space that contributes to disease and outcomes of disease. And we have to be very careful. The US Food and Drug Administration have been very careful about the drugs that are approved and their indications. I don't see why we shouldn't be equally careful about how and what way we apply genetics to the future of individualized therapy. We know so little of how to communicate these complex paradigms to patients. The clinician doesn't walk in and say, 'Here are the 19 SNPs and you've got 14 of them, so that explains it.' These are complex issues to convey, particularly regarding risk. These are things that we are still beginning to appreciate and the idea that we found a set of SNPs in a GWAS means that they can be used in therapy is a bit premature and very dangerous. I think we have to be cautious. GWAS is about discovery, moving into characterization, but for risk assessment, it still has a long way to go. We would love to see that come to fruition, but I think that will take more time than most expected before it takes place.

The Scan

Researchers Develop Polygenic Risk Scores for Dozens of Disease-Related Exposures

With genetic data from two large population cohorts and summary statistics from prior genome-wide association studies, researchers came up with 27 exposure polygenic risk scores in the American Journal of Human Genetics.

US Survey Data Suggests Ancestry Testing Leads Way in Awareness, Use of Genetic Testing Awareness

Although roughly three-quarters of surveyed individuals in a Genetics in Medicine study reported awareness of genetic testing, use of such tests was lower and varied with income, ancestry, and disease history.

Coral Genome Leads to Alternative Amino Acid Pathway Found in Other Non-Model Animals

An alternative cysteine biosynthesis pathway unearthed in the Acropora loripes genome subsequently turned up in sequences from non-mammalian, -nematode, or -arthropod animals, researchers report in Science Advances.

Mosquitos Genetically Modified to Prevent Malaria Spread

A gene drive approach could be used to render mosquitos unable to spread malaria, researchers report in Science Advances.