At A Glance
2003 - Present: Branch Chief, NASA Ames Research Center
2001 - Present: Director of the Genome Facility, NASA Ames Research Center
1999 - 2000: Postdoctoral Fellow in Genomics, Stanford University (worked with Ronald Davis)
1998: PhD in Biochemistry, Yale University (worked with Sidney Altman)
1990: BA in Biochemistry, University of Pittsburgh
You recently published a paper in PNAS in which you characterized bar-coded yeast gene-deletion strains that are quantified using microarrays. Can you briefly describe this system?
The molecular bar codes are comprised of 20-mer sequences that were picked to be unique for hybridization to high-density oligonucleotide arrays. Each strain of the yeast-deletion collection, which consists of over 6,000 gene-deletion strains, has one or two copies of unique molecular bar codes which can be amplified by PCR with common primers and hybridized to complements of those 20-mers that are synthesized on the surface of the arrays. The hybridization intensity of each of the bar codes correlates with the abundance of the particular strain in a pool of all the strains, so that the analysis of all the strains can be done in parallel. The system was invented at Stanford Genome Technology Center by Ronald Davis and his graduate student and was published in 1996. The arrays we used in the PNAS study were produced by Affymetrix.
What were your findings, and what are their implications?
We identified as much as 30 percent mutations, or defects, in the designed [bar code] sequences, which was something that we did not originally expect. In contrast to the high abundance of the defects, this surprisingly does not affect the overall performance of the hybridization assay on the arrays. That’s because a few defects can be tolerated within the 20-mer bar code, and because of the redundancy that is built into the system; as I mentioned, most of the strains contain two unique bar codes.
The assay can now further be optimized to be more accurate by either redesigning some of the more severely affected bar codes, where more than one nucleotide was changed or deleted, and by simply redesigning the strains. Alternatively, for those bar codes where the defects are minor, they can be compensated for by re-synthesizing high-density oligonucleotide arrays.
What are you using the system for now?
We now know which bar codes are mutated and which ones are exactly as they were designed. The accuracy of the assay can be trusted now to do things like high-throughput drug screens, for example, which we hope to do in the future. For example, we are looking at using the yeast-deletion strains to study the effects of radiation on DNA damage and repair. We are going to expose the deletion strains to different doses of ionizing radiation in order to advance our understanding of the effects of radiation on biological systems. We want to do this because NASA plans a long-term exploration mission, manned flights to Mars and the Moon. As soon as you get out of the lower Earth orbit, you find yourself exposed to higher doses of radiation. So we need to understand what the effects are, and perhaps develop some countermeasures. Using yeast as a model system, we can start to develop some radio-protectant compounds, for example, that somehow mitigate the radiation.
How will you screen for such compounds?
Let’s say you would first expose the cells to radiation and you would identify a deletion mutant that is affected more so than the wild type. You could try to compensate [for this] with a drug that would somehow enhance the function that is lacking in the mutant. Thus, you can identify drugs that have some compensatory function.
What role do microarrays in general play in your research?
Specifically for this project, a microarray is the readout for the assay of the relative amount of each deletion strain in the pool. In addition to that, we have recently performed whole-genome [transcriptional profiling] analyses of various model organisms, including the human genome.
Genome sequencing has produced large numbers of complete genomes. We know which parts of the genome encode genes, either from the decades of genetics work, or from computational predictions, which use algorithms to predict gene structures in the genome. However, empirical assays to identify genes in the genome by transcriptional profiling have thus far not been done comprehensively. We have initiated projects to comprehensively assay transcription from the entire genome of various model organisms. The aim is to identify genes and their transcriptional activity during, for example, different stages of development, or normal growth, just to be able to annotate the genome with genes.
What kinds of arrays did you use in these studies?
They were our own oligonucleotide arrays, made using maskless array synthesizers that we purchased from NimbleGen. They are a very flexible platform that allows us to generate the content computationally first and then do the synthesis without the need to generate expensive masks. We are also able to synthesize much longer oligonucleotides to give us better sensitivity. We have used various lengths: we used 36-mers, but we have also used 40-mers, 50-mers, 60-mers, and 70-mers. We examine the best length for both specificity and sensitivity. Our goal is to be able to identify less abundant transcripts that cannot be typically identified using conventional microarrays. One of the other reasons why we focused on making our own arrays is, it is cheaper. Also, the density of our arrays is very high, we are currently synthesizing 400,000-feature arrays, and we can potentially go up to 800,000 features.
We can detect the lower-abundance transcripts simply because the arrays with longer oligonucleotides have better sensitivity than arrays with shorter oligonucleotides. Sensitivity is one aspect of detection, the other is specificity. The specificity can only be derived from the computational assessment of the uniqueness of each probe within the genome. We examine the uniqueness of each oligonucleotide, relative to all the oligonucleotides that can be possibly found in a genome, by scanning the entire genome with one-base resolution. Thus, during data processing, we can assign a significance to the hybridization intensity, depending on the frequency with which the particular oligonucleotide is found within the genome. To do that, you need supercomputer resources. We have an SGI 3000 Origin supercomputer with 1,024 CPUs at the NASA Ames Genome Research Facility that we routinely use for both design and analysis of the microarray data.
The development of the microarray applications for different organisms is being done through collaborative research projects with academic and industry partners. The academic partners are listed on the NASA Ames Genome Research Facility website [http://phenomorph.arc.nasa.gov/index.php].