Merging SNPs and Epidemiology

  • Title: Assistant Professor of Molecular Physiology and Biophysics, Vanderbilt University
  • Education: PhD, Emory University, 2000
  • Recommended by: Jonathan Haines

In an effort to better understand human phenotype-genotype correlations, Dana Crawford has set her sights on getting the most out of the mountain of SNP data that has accumulated over the last few years. Crawford applies the data to large-scale epidemiological surveys, such as the National Health Nutritional Examination Survey (NHANES), a CDC-managed study that collected DNA samples from 7,000 Americans between 1991 and 1994. “As a human geneticist, I’m quite excited about NHANES because it has DNA samples, it’s linked to demographics, as well as an extensive questionnaire about lifestyle, such as drinking habits, exercise regimen, and medical histories,” says Crawford. “I can ask if specific SNPs associated with high HDL or low HDL or high LDL/low LDL, and that gives us a really powerful tool to identify associations because we have such a large sample size.”

She also works with data from a recently launched biorepository within the Vanderbilt University hospital system that comprises DNA from leftover laboratory blood samples drawn from outpatients. “We have the first 10,000 samples already, but this is going to be a multi-year project and the aim is to have a million of these samples banked at Vanderbilt — and all linked to an electronic medical [record] where you can mine the data for phenotype-genotype correlation from that particular resource. And that’s a really exciting project to be involved with right now.”

Crawford would love to get her hands on a tool that would streamline results for whole genome association studies with hundreds of thousands of SNPs. Currently, most researchers in the field rank SNPs according to the smallest P-values, but that leaves the chance of missing true associated SNPs, she says. “It would be really nice if we could just push a button and it would show us which are the true associations and which are not, which are the false positives,” she says. “That right now is really hard to understand. People don’t have really good ways of correcting to multiple testing, so it would just be really helpful to have a sifter that gives you the truth.”

Looking ahead

“I’m particularly interested in seeing some of the whole genome resequencing coming online and being able to use that technology on these particular types of biobanks,” she says. “I think it’s going to be extremely powerful.” She does, however, point out that data storage is an increasing problem with next-generation sequencing technology. “Every time you do a run on something like Solexa you have to have space for three terabytes worth of data, and how do you plan for that?” she says. “It’s got to be this perfect storm of computational technologies and analytical skills, so it’s just going to be quite interesting to how it’s all going to play out, but it’s all heading in that direction.”

Publications of note

In 2006, Crawford and her colleagues published a paper entitled “Genetic Variation is Associated with C-Reactive Protein Levels in the Third National Health and Nutrition Examination Survey” in Circulation. This paper provides an example of using a large-scale epidemiological survey for association studies. The study looked at genetic determinates of C-reactive protein levels associated with risk for coronary heart disease.

And the Nobel goes to …

Crawford says that she would like to win the Nobel for “a discovery of an association that makes it to the clinic, and makes a huge impact in either prevention or the way we treat people; a real translation from the bench to the bedside.”

