Human Genetic Variation Alters Anthrax Toxin Sensitivity
Martchenko, Candille et al., PNAS
Researchers at Stanford University School of Medicine show that genetic variation affecting capillary morphogenesis gene 2, or CMG2, dramatically alters toxin sensitivity in humans. In its analysis, the team reports on "a CMG2 single-nucleotide polymorphism occurring frequently in African and European populations [that they found] independently altered toxin uptake." The group goes on to suggest "testing of genomically characterized human cell populations may offer a broadly useful strategy for elucidating effects of genetic variation on infectious disease susceptibility."
Q&A: LBNL's Sue Celniker on Sequencing for the modENCODE Project
Name: Susan Celniker
Age: 55
Position: Head, Department of Genome Dynamics (since 2008), and co-director, Berkeley Drosophila Sequencing Program (since 1996), Lawrence Berkeley National Laboratory
Experience and Education:
Research fellow, then senior research fellow, senior research associate, Division of Biology, California Institute of Technology, 1983-1996
PhD in biochemistry, University of North Carolina, Chapel Hill, 1983
BA in biology and anthropology, Pitzer College, Claremont, Calif., 1975
Sue Celniker heads one of 10 research groups that participate in the Model Organism Encyclopedia of DNA Elements, or modENCODE, project. The effort, launched by the National Human Genome Research Institute in 2007 and funded with $57 million over four years, aims to identify all functional elements in the genomes of the fruit fly, Drosophila melanogaster, and the round worm, Caenorhabditis elegans.
As part of the project, Celniker's modENCODE group, which includes researchers at six other institutions, was awarded a $14.5 million grant two years ago for the "Comprehensive Characterization of the Drosophila Transcriptome." In Sequence recently spoke with Celniker, who heads the department of genome dynamics at Lawrence Berkeley National Laboratory, to find out what the project has achieved at its halfway point, and what role new sequencing technologies are playing in it.
Can you give a brief overview of the goals of modENCODE, and an update on where the project stands today?
The project was started by NHGRI to augment the human ENCODE project, which at that point had only focused on one percent of the human genome. They thought the next scale would be to study worms and flies, which are a thirtieth in size compared to the human genome, and then scale up to do the entire human genome. The other advantage of having worms and flies is, they are both genetic model organisms, so validation and testing of models would be significantly easier.
There are 10 groups that constitute the modENCODE consortium, with parallel groups for most projects in worm and fly. These groups study the transcriptome — including mRNAs, non-coding RNAs, transcription start sites, untranslated regions, and miRNAs —, regulation of transcription focusing on transcription-factor binding sites, chromatin marks, and DNA replication.
We just published a marker paper describing the data types being produced and our plans for data integration. Our data can be obtained from the modENCODE project website viewable in a browser or by download using FTP.
When the project started, most of the new high-throughput sequencing technologies were just out the gate. How are they being used in the modENCODE project today, and what advantages do they offer over, for example, microarrays?
The group that I head, for example, proposed to use microarrays with 38-base pair resolution to profile the fly transcriptome, and sequencing, at one-base pair resolution, is a significant increase in resolution. Most of our work in transcription profiling has been done using the Illumina sequencing technology.
So far, we have analyzed 24 cell lines by microarrays, and we have repeated four of them by RNA-seq. We have completed 31 developmental time points on microarrays, and are in the process of analyzing the same samples using RNA-seq. We will have an enormous amount of data to compare both approaches. We are just in the process of collecting the RNA-seq data this summer. We have 12 samples close to being captured at about 15 million reads for each state. We want to figure out whether we reach saturation or not, and we don't know that yet.
It's easier to identify splice sites and splice variants with the one-base pair resolution. One aim of our grant is to understand the control of splicing, a project directed by Brenton Gravely at the University of Connecticut Health Center. He is knocking down components of the RNA binding machinery using RNAi, and then sequencing the products to identify changes in splicing. All of his work is done by RNA-seq. It's very difficult to design a microarray that would capture all the different putative splice variants. We could not do that project, realistically, without having switched to RNA-seq.
We are also doing Rapid Amplification of cDNA Ends, RACE, and proposed initially to clone and sequence the products in order to identify transcription start sites. Now we have a pooled strategy where we can sequence hundreds of products. We have been using 454 for that, but we are planning to move to Illumina to compare the two, since 454 is more expensive than Illumina. It's been truly revolutionary, the amount of data we can capture.