Skip to main content
Premium Trial:

Request an Annual Quote

Genomics at the Population Scale


The application of genomic technologies to population studies has been a boon for the field of epidemiology. Since 2005, the year small-scale genome-wide association studies were introduced, several consortia have identified robust and reproducible hits for disease risk across the human genome, and within increasingly larger cohorts.

A recent success story is in the case of schizophrenia — three papers published in Nature in June of last year report genome-wide association studies that identified common variants for the complex disorder on chromosome 6. The results of each study represent aggregations of the work of dozens upon dozens of investigators — a trend experts say is being adopted enthusiastically in the research community. Investigators are also milking the data for all that they're worth: GWAS are becoming more cost-effective by sharpened analytical tools and data exchange among institutions.

Both conventions have been adopted by the Tobacco and Genetics Consortium, a collaborative research team whose GWAS meta-analysis for smoking will be published in Nature Genetics this spring; their paper, written under the TAG Consortium byline, includes the work of more than 120 authors, contributions from two additional consortia, and data mined from 16 separate GWAS.


Helena Furberg, a cancer epidemiologist, was beginning the second year of her National Cancer Institute K07 Career Development Award grants under the mentorship of Patrick Sullivan, a psychiatric geneticist and epidemiologist at the University of North Carolina at Chapel Hill — to examine the genetic epidemiology of nicotine dependence — when the GWAS wave hit.

"In 2007, we really started seeing a lot of GWAS publishing their findings for traits like diabetes, heart disease, and different cancers," Furberg says. "It occurred to me that a lot of these studies had collected smoking as a covariate risk factor for their disease, but nobody was probably studying smoking in and of itself."

So, like she would do countless times throughout the remainder of the project, Furberg got on the phone and contacted the principal investigators and data-handlers of countless GWAS. She was able to recruit researchers — and heaps of data — from 16 disparate studies whose only connection was that they had collected questionnaires on smoking as a covariate risk factor for the diseases they were funded to study. Furberg says that her collaborators were enthusiastic about the project from the start, because they saw it as "a really efficient use of their really expensive data."

An internal grant from the Lineberger Comprehensive Cancer Center at UNC for $200,000 — funds derived from taxes on cigarette sales in the state — supported the hire of a postdoc and a project manager. It also purchased the group's space in the Genetic Cluster Computer at Vrije Universiteit in Amsterdam. With Sullivan's support — and a coincidentally appropriate acronym ("in brainstorming about what we could call [the effort], I thought it was nice that it worked out T, A, G, and C," Furberg says) — the TAG Consortium was formed.

The group mined surveys from the studies and decided upon four phenotypes they were able to assess across all 16 of them: smoking initiation ("ever" versus "never" smoking), which included all participants, and, among smokers, age a person began smoking, the number of cigarettes smoked per day, and cessation achievement. In an effort to reduce misclassification, Furberg says that patients identified as former smokers had to have quit smoking for more than a year. Their combined sample size was more than 74,000.

From there, they devised a uniform analytic plan that could be applied to each study, which accounted for the potential effects of population stratification and adjusted for disease status. Because some of the 16 studies performed genotyping on different platforms, the researchers imputed their results to compose a common set of 2.2 million markers for interrogation.

Sullivan says that the team "confirmed the genome-wide hit on chromosome 15 that's smack in the middle of three nicotinic receptors," originally identified by three separate groups in 2008. "And we found new loci for initiation and cessation, but nothing for age of onset," he says.

Collaborate rather than compete

At the time that TAG had begun to cull information from their massive data set, Furberg and Sullivan discovered that there were two other consortia working on similar projects with smaller cohorts. Thorgeir Thorgeirsson of Decode Genetics was heading up the ENGAGE Consortium's GWAS for smoking, and Jonathan Marchini of Oxford University was leading a collaboration with GlaxoSmithKline for a similar genetic association investigation.

[ pagebreak ]

"Rather than the three of us competing," Furberg says, "we decided to work together and use each other as replication." Sullivan says that the groups' merger brought the total sample size to nearly 150,000 individuals.

Given the degrees of investigation that each team had already completed independently, they decided to reconvene to discuss compilations of their top 15 loci — those that each group felt "were the most exciting," Furberg says — when they had all been identified. She says that TAG's approach was to construct its list based on P values and linkage disequilibrium — that way, the team was more interested in SNPs that occurred in clusters as opposed to those that appeared alone, which, in their view, likely represented false positives.

When the three groups exchanged lists, they didn't see much overlap among them. And while this would seem disappointing, it's actually invigorated the researchers to initiate further investigations and probe deeper into the genome.

Expecting too much of GWAS?

It may be that the feelings of discouragement surrounding indeterminate GWAS results are the result of over-heightened expectations of these studies. Rob Hegele, director of the Blackburn Cardiovascular Genetics Lab at the Robarts Research Institute in Ontario, says that while the genetic component of human disease is likely to have a large effect, results from large genome-wide genotyping projects are only a small piece of the puzzle. Hegele says that GWAS are "one of the most exciting things" he's witnessed during his 25 years in the field, but "it all needs to be kept in context." He says that though GWAS have inevitably advanced the field, "it's a relatively incremental advance. It's still relatively modest compared to the total amount of variation that's unexplained."

Sullivan acknowledges that TAG's data "only speak baby steps," and that genotyping assays are imperfect. "Our technology has blind spots — or corners that we can only peer very thinly into," he says, adding that "there are a lot of people that are very impatient, and justifiably so, with GWAS. People are clamoring for there to be something clinically tangible coming out of these studies … and I think people really lost some perspective on the whole business."

"The reason why most people began these studies was not because they were [trying] to come up with a clinical test of any sort. The reason this work was [initially] done was to get a handle, a starting point, into the fundamentals of a disorder by studying DNA sequence variation," Sullivan says.

Furberg says that TAG's GWAS meta-analysis of 16 genome-wide genotyping studies is an "important first step," and reiterates that even in GWAS that produce replicable hits, the markers implicated in disease risk are "not associated with strong increases in risk — they're associated with very subtle increases in risk for whatever you're studying."

Sullivan maintains that even so, there "may be some context in which this genetic information," like that generated by the TAG effort, could be relevant.

Promise for pharmacogenomics

Researchers and clinicians have already demonstrated the potential impact that genome-wide genotyping in population studies has on public health. Many agree that the results of GWAS, if not directly applicable in the clinic, bear potential for pharmacogenetic — and, on a larger scale, pharmacogenomic — investigations.

Furberg and Sullivan, along with Jamie Ostroff, director of the smoking cessation program at Sloan-Kettering, and Caryn Lerman, director of the Tobacco Use Research Center at the University of Pennsylvania School of Medicine, suggest that the translation of GWAS results into pharmacogenetic tests to tailor cessation treatments to patients is, at present, the most promising use for this type of data in their upcoming Genome Medicine commentary.

[ pagebreak ]

"To date, pharmacogenetic trials of smoking cessation suggest that genetic variation in nicotine metabolizing enzymes and dopamine and opiod pathways may play a role in [nicotine replacement therapy] efficacy, while variation in the dopamine pathway may be relevant for response to buproprion," they write, adding that "GWAS in pharmacogenetic studies may have even greater potential to identify loci associated with improved cessation and, in particular, rare adverse events."

Muin Khoury, director of the Centers for Disease Control's National Office of Public Health Genomics, says that GWAS hits could eventually be used to develop additional genetic tests. Also, these studies could enhance researchers' and clinicians' "ability to integrate genetic knowledge [in order to] sharpen interventions," and determining whether a patient would be "benefited by or hurt by" particular therapies.

Khoury developed the Human Genome Epidemiology Network — or HuGENet — in 1998 for this very reason. Since 2001, HuGENet has maintained a curated database of population-based epidemiologic studies of human genes derived from PubMed-indexed publications. Its companion tool, the HuGE Navigator, includes data on genetic variants, gene-disease associations, gene-gene and gene-environment interactions, as well as genetic test evaluations.

"Pharmacogenomics is a promising area," Khoury says, "and GWAS is a tool to discover relevant genes." When it comes to "applying genomics to epidemiologic studies, I think we should do more of it and do it now," he says.

Analyzing the analytics

Marilyn Cornelis, a research fellow at the Harvard School of Public Health and member of the GENEVA Consortium — a group that aims to assess the role of the gene-environment interactions in the post-GWAS era — emphasizes the potential for environmental and lifestyle factors to affect genetics and its importance for public health.

"Considering G×E [gene-environment interactions] might improve our ability to identify risk loci missed by studies focused only on the 'G,'" Cornelis says.

Patrick Sleiman of the Children's Hospital of Philadelphia Research Institute says that "the problem with gene-environment interactions is in defining the correct environmental variable," as confounding is a common issue. As a hypothetical example, he says, if one wishes to study environmental or socioeconomic variables associated with lung disease, it wouldn't be correct to "conclude that you're more likely to develop lung cancer because you're poor."

"In effect," he says, "you're not asking the right question. The fact that you are more likely to smoke if you are poor and therefore develop lung cancer is confounding your analysis."

Sleiman suggests using Mendelian randomization as a surrogate analytical tool; this method "entails the use of genetic variants as proxies for the environmental exposures under investigation," and its power "lies in its ability to avoid the often substantial confounding seen in conventional observational epidemiology," he writes in a February 2010 Clinical Chemistry review.

The use of Mendelian randomization is just one way that researchers can improve their GWAS analyses, the Robarts Research Institute's Hegele says. The advantage is that this method can be applied post-GWAS, "after the markers have been implicated," he says.

"Mendelian randomization has proven to be really valuable when you're trying to infer the stages in the linkage — in the chain of causation," he says. Hegele says that this method is analogous to the gold standard use of randomized, controlled experiments in clinical trials; whether a specific allele, randomly assigned at birth, causes a specific trait downstream is the same, he says, as whether a patient is randomly assigned to receive a medication or placebo. In terms of causation, Hegele says, Mendelian randomization has historically been used to rule out possibilities like "low cholesterol causes cancer," simply because cancer patients often have low cholesterol.

[ pagebreak ]

Phenomics, he says, could lend another critical improvement for GWAS analyses. While it's now possible to achieve a "huge level of granularity and specificity at the level of the genotype," Hegele says that researchers ought to apply the "same kind of rigor, care, and comprehensiveness" to the classification of phenotypic traits. Because in GWAS, genotypic assignments are based on "tremendously accurate" microarrays, and must be related back to the phenotypic presentation, Hegele says that it's imperative to apply the same "scientific quality and quantity" to both ends of the equation. He suggests that as refined phenotypes are developed, "new associations [could] start to show up," in previously established data sets.

Where SNPs meet CNVs

Whole-genome sequencing is not to be ignored, even though its applicability in the broad context of public health has been debated in the literature. The falling costs of reagents and the diversification of sequencing platforms are likely to facilitate the use of whole-genome sequencing to identify elusive rare variants for disease risk.

Hegele suggests that deep re-sequencing — "deep" in that only patients with extreme phenotypes are interrogated intensively — could be useful as a discovery tool for rare variants.

"With the GWAS we're just looking at common variants — the common SNPs," he says. "With sequencing, I suspect we'll also find rare variants. Taking that to the next level would then require sequencing the whole genome in patients and then seeing whatever falls out in known and novel genes."

Non-linear effects

Going forward, Hegele expects to see genomic investigations employ a mosaic approach. "Certainly, common variants are going to explain a percentage of it," he says, "and then another percentage would be rare variants, [and] copy number variants with larger effect sizes, and I think [while] cumulatively the genetic component is going to be substantial … individually, most variants will have small effect sizes."

The underlying issue is that most studies rely on the use of linear statistics, Hegele says. "There's going to be non-linear effects: effects related to time, higher level interactions like gene-gene interactions, gene-environment interactions, differences related to geographical ancestry or ethnicity," specifically, some variants may be of importance in one ancestry population but not in others, he says.

Ethical considerations

Esteban Burchard, an associate professor at the University of California, San Francisco, has performed multiple GWAS for asthma risk in African-American and Hispanic populations. To date, it's been especially difficult to deduce robust and reproducible genome-wide hits in these studies, he says, because they're limited by the commercial availability of diverse genotyping platforms. "The original GWAS chips were developed for Europeans," Burchard says. "The problem is that they don't 'fit' for non-European populations."

To combat the issue, researchers at UCSF have teamed up with Affymetrix to design myriad population-specific chips, based on the results of their $25 million American Recovery and Reinvestment Act-funded GWAS effort to genotype 100,000 individuals of various ancestries.

Burchard maintains that while some genetic risk factors are shared ubiquitously, others are unique to specific populations — a hypothesis he and his colleagues have tested as part of the EVE Consortium's effort to analyze GWAS data for asthma across African-, European-, and Latino-American populations.

They expect to publish data later this year, Burchard says, but it's clear that "there are some genetic risk factors that are common to all groups."

"There are likely to be genetic risk factors that are common — whether you're male, female, black, white, blue, or green, high cholesterol is a problem," he says. "But the genetic risk factor for carbomazapine toxicity is only specific to Asians."

TAG's Furberg is cognizant of this principle. For her part, she plans to extend her team's findings to meta-analyses of additional cohorts of African-American and Hispanic descent.

[ pagebreak ]

The ethics of communicating genetic risk, Furberg says, is another looming public health issue. Harvard's Cornelis agrees. "Simply telling someone they are genetically susceptible to a disease isn't very useful, and may not even be ethical," Cornelis says. "Telling someone they are genetically susceptible to disease and providing them with advice on how they can modify that risk — the 'E' in G×E — however, is useful."

Furberg agrees with this sentiment — to a point. A potential complication, she says, is that reporting genetic risk can be a "double-edged sword." While on one hand, telling patients that they are at high risk for developing a disease might cause them to alter their behavior for the better, telling patients that they are at low risk for developing a disease could bolster a "false re-assurance that they can keep doing what they're doing," she says. Furberg maintains that improving the communication of low-penetrance risk to the public is also essential.

Overall, the Robarts Research Institute's Hegele says, scientists should play it safe. "There may even be ethical issues that we don't even understand yet, or, in fact, whose consequences we haven't fully anticipated," he says. "We need to continue to be careful because I think we don't understand all of the issues that could arise."

Communicating GWAS results is something that should never be taken lightly, UNC's Sullivan says. When researchers merge their data, it's imperative to evaluate the methodologies behind their findings along the way. TAG, ENGAGE, and the Oxford/GSK team are taking every precaution to do just that when they combine their separate meta-analyses into one "mega-analysis" — the next phase of their research efforts, Furberg says.

"They key issue is, as always, is our science good enough? In other words — and especially in the public health context — you don't want to be in the kind of situation where as scientists you say one thing at one time and then a few years later you rescind it," Sullivan says. "That just ends up confusing the public, and, usually, as an effort, these [scientific] voices lose credence, and that's not good."

"Getting the science right is first," Sullivan says. "And I think our work is a big step in that direction."

One-hit wonders

The CDC's Khoury says that while "most public health programs essentially don't use genetic information right now," the future for epidemiological genomics is "very bright because we have all these tools and they're becoming cheaper and cheaper and more widely available." Still, he says, it's important to realize that clinical translation will take time. "There's so much more work to be done after you find these hits," he says. "It takes years."

This is something that Sullivan and other genetic epidemiologists know very well. Researchers need to foster a greater public "understanding of the time that's really required to go from a GWAS signal to really understanding deeply what it means," he says. "One of my colleagues [even] says — with a bit of exaggeration — that every GWAS hit is a career."

The Scan

Mosquitos Genetically Modified to Prevent Malaria Spread

A gene drive approach could be used to render mosquitos unable to spread malaria, researchers report in Science Advances.

Gut Microbiomes Allow Bears to Grow to Similar Sizes Despite Differing Diets

Researchers in Scientific Reports find that the makeup of brown bears' gut microbiomes allows them to reach similar sizes even when feasting on different foods.

Finding Safe Harbor in the Human Genome

In Genome Biology, researchers present a new approach to identify genomic safe harbors where transgenes can be expressed without affecting host cell function.

New Data Point to Nuanced Relationship Between Major Depression, Bipolar Disorder

Lund University researchers in JAMA Psychiatry uncover overlapping genetic liabilities for major depression and bipolar disorder.