Skip to main content
Premium Trial:

Request an Annual Quote

Claude dePamphilis, Floral Genome Project, Stalking Gene Expression


At A Glance:

  • Claude dePamphilis, associate professor of biology, Penn State University.
  • 1977 — AB, Oberlin College
  • 1988 — PhD, University of Georgia
  • Postdoc — (1988-1989) NSF, University of Michigan; (1989-1990), Indiana University.

Gregor Mendel did his experimental gardening work with one plant, the pea.

Claude dePamphilis and the researchers on the NSF’s Floral Genome project will be doing theirs with 15 other flowering plants, but none of them the pea.

The Floral Genome project is a $7.4 million, three-year effort to investigate the genetic architecture of flowers, and develop methods for evolutionary functional genomics.

An associate professor of biology at Penn State University, dePamphilis is the principal investigator of a project that will sequence the plants, analyze gene expression in early-flower development, and then crunch the numbers that bloom from the previous work.

The plants selected include: Zamia floradana, Welwitschia mirabilis, Amborella trichopoda, Nuphar advana, Persea americana, Liriodendron tulipifera, Acorus gramineus, Saruma henryi, Eschscholzia californica, Illicium parviflorum, Asparagus officinalis, Vaccinium sp., Cucumis sativus, Beta vulgaris, and Ribes americanum. It wouldn’t be a pretty bouquet but it would include California poppies, blueberries, avocados, tulip poplars, star anise, cucumbers, beets, currants, and the waterlilly, to mention a few of the more well-known plants. It would also include exotics like the Welwitschia, which only grows in the Namib desert, and Amborella, which grows in the cloud forest of New Caledonia and is a plant that some regard, along with the waterlilly, as the oldest flowering plants.

In a project of sweeping scope, and one that may have long-lasting benefits for researchers, the floral genome project researchers will first sequence this bouquet of 15 flowering plants — 13 angiosperms and two gymnosperms — seeking genomic clues to understand flower origins, floral development and the evolution of floral diversity.

The researchers include: Hong Ma (Penn State), Webb Miller (Penn State), Steven Tanksley (Cornell), Jeff Doyle (Cornell), Douglas Soltis (U. Florida) Pamela Soltis (U. Florida). David Oppenheimer (Alabama), Michael Frohlich (Museum of Natural History, London), Dawn Field (Oxford), Victor Albert (U. Oslo), and G nter Theissen (U. Jena ).

BioArray News recently spoke with dePamphilis.

You have picked quite a bouquet of flowers to examine. Why did you pick the ones you did?

The idea was to select representative species from throughout the flowering plants. We were trying to include plants that were either of great evolutionary interest, or already economically important plants. We didn’t include tomato, rice, and Arabidopsis — obviously there will be plenty of data for those species. We selectively added 15 species so that the phylogenetic diversity of flowering plants was well covered.

We had several criteria for what would be a good choice for these studies. Our criteria included: looking for plants that already had some developmental data, if possible; plants that had small genomes, because we are anticipating that they will be used a lot now that there will be as many as 10,000 ESTs to get them jump-started; plants that would have diploid genetics; flowers that could be relatively easily sampled, and with structures that we could easily study. We included plants that we knew could be transformed, whenever possible, so that there was an opportunity for functional genetic studies down the line.

This is a huge project, can you describe the scope?

The project began formally in Oct. 2001, when funding from NSF began. The ideas began a year and a half before that.

We will be sequencing a total of 100,000 ESTs from early flower development, distributed through the 15 species. That will be about 10,000 from nine species and a few thousand from the other ones. There is a finished sequencing component: Once we have identified ESTs, we will be selecting genes that are of interest, homologs of known developmental regulators, or genes that we have other reason to think might be of interest in flower development. And, we will perform finished sequencing on them so we can get high-quality data to support phylogenetic and molecular evolutionary analysis.

The second part is gene expression studies. Once we have identified genes that are potentially of interest, that are homologs of known regulators, or are otherwise of other interest to us, we will be performing at least two kinds of gene expression analysis, classical in situ hybridizations, and more extensive microarray studies of all of the genes that we have captured in the EST studies. The microarray analysis will be directed at questions like: Are the genes that we have detected in the floral EST sets floral-specific or preferentially expressed in flowers and in what flower structures can we find that they are expressed? We will be looking at at least three different developmental stages of flowers and different structures in the mature flower and comparing this with leaves and fruits, at least.

The third component is informatics. We represent a lot of interest in evolutionary biology, DNA sequence analysis, and phylogenetic analysis. We have devoted a good bit of our efforts to providing databasing for ESTs and consistent analysis of ESTs, and, from there, we will perform very rigorous work on gene families, including phylogenetic and molecular evolutionary analysis of the gene families that we detect in our studies. One of the things that we have done, up until now, already is to begin to assemble a better collection and deeper understanding of how genes in Arabidopsis and rice are organized into related sets that we are calling families. So, we have been doing new analysis to better understand all of the gene families in these two genomes. From that, we will link the EST data, and also link our expression studies, including an evolutionary perspective of how expression has changed through time in individual genes and gene families.

This project is really large. We have so many species, each of which has become a small individual research project in order to obtain enough tissue; in order to make sure that the plants could be retained in cultivation; in order to understand enough about their floral development that we could proceed with interpreting in situ hybridization and perform similar sampling, and finally, doing the RNA isolations from these tissues of plants that have been never been worked with before. That was, in some cases, quite challenging, so we had to develop new approaches to sample the tissues to get the RNA out.

Where are you now in the project?

We are early in the gene expression. We have to capture the ESTs first, and we are close to finishing the first EST library.

How will you handle the data and analysis?

We meet every week with statisticians, computer scientists, and biologists to discuss the current literature and approaches to analysis.

We have a team that includes computer scientists and biologists with great interest in informatics. Our two lead programmers are trained as biologists and have a strong computing and bioinformatics background. Also working with us is Webb Miller, one of the authors of the Blast algorithm. We also have as a collaborator, Steve Ferris of Sweden, who is one of the world’s leading experts in computer analysis of phylogenetics; and Dawn Field, who is working on broad-scale informatics analysis of genes and genomes at Oxford. She’s been helping us to assemble our pipeline of analysis.

There is a larger and larger need for supplemental data that goes along with everything we would like to publish, and it’s really important to have standards — both in the microarray world, and, we think too, for the deposition of expressed sequence information tags. We will follow the MIAME standards, and we are thinking very hard about similar standards for EST data and the sequencing data that we are depositing.

Where will the analysis be done?

We are going to use spotted microarrays, and we have kept open as an option, at least until the end, whether to use either spotted microarray, or synthesized large oligos. But Affy arrays are clearly out of the question for us; they are much more expensive to design and make those microarrays. And, we don’t have the benefit of full genome sequence data. We are going to use the resources we are generating — the cDNAs themselves.

We have a core lab for microarray analysis at Penn State, and also a similar lab at the University of Florida. We have decided to split the microarray analysis equally between Penn State and Florida, using a very carefully constructed set of controls that will be placed identically on each chip. We also are primarily doing different species at different locations.

We had to create a databasing system for all of the different parts of the project and we created a LIMS system here, which is helping to keep track of all the samples, the data, and experiments, and link it all together.

What will this work contribute to science?

Our belief is that this project could make important contributions in at least two very immediate areas. One is practical, by helping to understand if there is a core set of genes that may perform very similar functions throughout the flowering plants. Another important goal is the understanding of how the floral genome was assembled and how it was put together, and how much was it changed throughout the flowering plants. This is a deep evolutionarily question about how flowers originated and then diversified. Gathering these data, even if not complete, will provide a lot of insight of how the floral developmental program was assembled.

The Scan

LINE-1 Linked to Premature Aging Conditions

Researchers report in Science Translational Medicine that the accumulation of LINE-1 RNA contributes to premature aging conditions and that symptoms can be improved by targeting them.

Team Presents Cattle Genotype-Tissue Expression Atlas

Using RNA sequences representing thousands of cattle samples, researchers looked at relationships between cattle genotype and tissue expression in Nature Genetics.

Researchers Map Recombination in Khoe-San Population

With whole-genome sequences for dozens of individuals from the Nama population, researchers saw in Genome Biology fine-scale recombination patterns that clustered outside of other populations.

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.