Name: Patrick Schnable
Title: Director, Center for Plant Genomics, Iowa State University
Background: 2005-present, associate director, Plant Sciences Institute, Iowa State University; 1999-present founding director, Center for Plant Genomics, ISU; 1999-2003 founding director, Center for Plant Transformation & Gene Expression, ISU; 1998-present, professor, Iowa State University; 1988-1998, associate professor, ISU; 1986-1988, postdoc, Max Plank Institute; 1981-1986, graduate research assistant, ISU
Education: 1986 — PhD, plant breeding and cytogenetics, Iowa State University; 1981 — BSc, agronomy, Cornell University
Patrick Schnable is the lead author on the November 2009 Science paper that described for the first time the complete draft genome sequence for the B73 variety of maize, which the authors tout as "the most complex genome known to date."
One of the key findings of the paper was that the genomes from different maize lines can be extremely diverse — more different, in some cases, than the human genome is from the chimp genome.
After the decade-long effort to sequence B73, maize researchers like Schnable are turning to array comparative genomic hybridization and next-generation sequencing to untangle the differences between distinct breeds of maize, information that can help plant breeders select for better quality crops.
Schnable discussed these efforts during a talk at the Plant and Animal Genomes Conference, held in San Diego this month.
Specifically, Schnable and his colleagues have been using Roche NimbleGen CGH arrays and Roche 454 sequencing to look at genomic variation between different lines of maize. BioArray News spoke with Schnable about these efforts, and the challenges presented by the sequencing of maize B73, last week. Below is an edited transcript of that interview.
You maintain that the cost per gain for plant breeding is going up. Is that a long-term trend or something that has happened this decade because of the availability of new technologies?
I would say that it started sometime after the mid-1980s. Prior to then breeding was done using mostly traditional approaches. All biotech investments since then represent new costs. Molecular biology and genomics wasn't responsible for any of the genetic gain prior to the mid-1980s, and so obviously we are spending more now because the cost of traditional methods has not decreased and we've added all of the costs of genomics on top of that.
You have been working on maize for a long time, since your graduate studies. Is that a circumstance of your education or were you interested in it as a crop?
I’ve worked on maize for my entire professional career. I remember in high school, I ordered some corn that had variously colored kernels and planted the different colored kernels in different parts of the garden to see if I could try and figure out the inheritance of these colors. Of course, because I didn't control pollination, as an experiment this was a disaster. But it did spark my interest in the phenotypic diversity of maize, so when I went to graduate school, that was the natural crop for me to focus on.
In November, several papers were published on the sequencing of B73. Could you describe the significance of that? What does it mean for you as an academic and what does it mean for industry?
The maize genome has been a longstanding goal of the research community, both the public and private sectors. The national corn growers worked with Missouri Sen. Kit Bond to establish the plant genome program at [the National Science Foundation] back in '98. The long-term goal at that time was to sequence the maize genome, but it was clear to the maize research community that the 1998 technology wasn't up to the task. So the NSF began by funding the development of technologies to sequence some smaller, less complex, genomes, laying the groundwork for the eventual sequencing of maize.
So sequencing the maize genome has been a longstanding challenge and it is very satisfying to have it completed. All of us in the maize genetics community knew that having the genome sequenced would enable a great deal of additional biology but, what has surprised me is how much more useful this genome has turned out to be than I imagined.
As we expected, just having the genome sequence has enabled a lot of experiments and analyses. But the genome sequence is also changing the way we think about experiments and how we analyze data. It's kind of like, when you think about getting a hammer you can think of some loose nails that need to be hammered, but when you actually have a hammer in your hands, you begin to think of all sorts of other projects you could undertake.
[ pagebreak ]
One of the points you made during your PAG presentation is that maize is a diverse species and there is an interest in explaining that diversity. What is being done in that direction?
Well over the years there’s been a lot of work trying to identify the genetic determinants of phenotypes, by, for example, making mutants, cloning genes, and traditional QTL mapping studies. During the talk I mentioned the NAM, or Nested Association Mapping, population that was developed by Ed Buckler and his colleagues [at Cornell University]. The NAM population is the next step in that process of trying to determine which genes control phenotypic variation. The NAM population was developed by identifying 25 distinct inbred lines, lines that were selected to be as distinct from each other as possible across the wide range of genetic variability of maize. Each of those lines was crossed with the B73 inbred, which is the line whose genome was sequenced, and then 200 recombinant inbred lines, or RILs, were extracted from each of those 25 crosses. This yielded 5,000 recombinant inbred lines that are being genotyped and phenotyped by a number of groups, allowing really an extraordinary level of mapping resolution. Hence, this population is going to be a great resource for assigning genes to function.
So, that's all underway. We are beginning to identify the copy number variants and the presence absence variants, or PAVs, which weren't available before, and then projecting those onto the NAM recombinant inbred lines. We'll be able to ask, for the phenotypic variation that we see, is that controlled by CNVs or PAVs?
How will you determine that? What technology will you use?
We’re discovering the CNVs and PAVs using a combination of next-generation sequencing and comparative genome hybridization. After projecting them to the NAM RILs we’ll assign function to individual CNVs and PAVs via association mapping.
You mentioned that you have been using Roche NimbleGen arrays to compare B73 and a variety of maize called Missouri 17, or Mo17. Is that related to this work?
For the NAM experiments I have mentioned 26 inbreds. There's B73, which was crossed with the 25 diverse lines. Some time ago Mike Lee [from ISU] and his colleagues developed a large set of RILs from a cross between B73 and Mo17. These are called the IBM, or Intermated B73/Mo17 RILs. Because the IBM RILs were produced a long time before the rest of the NAM population, there’s been time for more analyses on them than on the NAM RILs.
This is why we began our comparative genome hybridization experiments with B73 and Mo17. We wanted to ask, 'Is there variation in copy number between these lines?' It turned out that there was a lot of copy number variation, much more than in mammalian systems.
Another big surprise was the high rates of presence absence variation. This term refers to genes that are present in B73 but absent from the Mo17 genome or vice versa. Next we wanted to know whether this high level of structural variation is a peculiarity of the comparison between B73 and Mo17. So, we started looking at some other inbred lines and for this we selected the parents of the NAM population. We haven’t analyzed all of them yet, but so far we have found that high levels of CNV and PAV are common. It appears to be generally true that there are many copy number variants and many missing genes can be detected between two inbreds that are not closely related.
And what technology do you use for all that work?
Comparative genomic hybridization. In our case [we're] using Roche NimbleGen microarrays. This is where the collaboration with Roche NimbleGen came in. We are using arrays designed at NimbleGen and, in fact, many of the early hybridizations were actually done there using our DNA.
Is CGH a tool you are familiar with? Is this a tool that others in the maize community would use?
Yes, it is something I know has been used in a number of systems for awhile. As far as I know, it hadn't been used in maize previously. This is probably because CGH experiments are most informative if a genome sequence is available. Once the maize genome sequence became available, even in draft form, it made CGH experiments much more exciting.
When did you have the draft sequence and when did you start designing the CGH study?
The genome was sequenced in a BAC-by-BAC approach, BACs were organized into minimum tiling paths by our colleagues at the University of Arizona, and those BACs were sequenced at Washington University, Cold Spring Harbor Laboratory, and the University of Arizona. And in February of '08, the sequences of many of those BACs were released. So our team had access to that information when we began designing the NimbleGen arrays.
[ pagebreak ]
How did you become acquainted with Roche NimbleGen technology? You could have worked with another vendor.
One of my former graduate students, Yan Fu, alerted me to a paper from Dick McCombie’s lab [at Cold Spring Harbor] on sequence capture from a mammalian system that used the Roche NimbleGen technology. At the time we were cloning a number of eQTL and phenotypic QTL, so we'd zoomed in on regions of the genome for [which] we needed more genetic markers.
When I saw this paper, it was clear we needed to get access to the sequence-capture technology, so I contacted NimbleGen. I got in touch with Jeff Jeddeloh, who is one of their senior scientists, and we set up a collaboration that involved my lab, Nathan Springer at the University of Minnesota, and Brad Barbazuk at the University of Florida to make sequence capture work in maize. At the time we realized there would be some substantial challenges to make this work in complex plant genomes and it did take awhile to get it working, but we did get it working and a paper that outlines the procedure is under revision.
While we were establishing the sequence-capture technology for maize, we realized that in order to understand the results we were getting, it would be nice to have some CGH data, and so that's what motivated that work, at least initially.
What are some of the most important things that you want to accomplish in your research right now?
I've alluded to a big part of this already, which is to finish comparative genome hybridizations on all the NAM parents to discover CNVs and PAVs, project those to the NAM RILs, and then do association mapping to ask whether particular structural variations are associated with specific patterns of phenotypic variation.
If this works out as we expect, it will provide a new tool for breeders. As they look at their lines and try to identify which alleles are important for agronomic traits, they already know that SNPs are important, but structural variation is also likely to be important. However, mostly it's being ignored right now because it isn’t being assayed in breeding lines. If you just sequence the genome of one line, you won't be able to identify CNVs or PAVs. What I mean is, if you have only sequenced one line and a gene is not there then you don't know what is missing. CGH allows us to detect missing genes without resequencing whole genomes.
You mention the breeders. It seems like they are paying attention to what is going on and they are keen to implement this knowledge into their programs, but how far apart are these two things: research and it actually being implemented by breeders?
I would say that the major seed companies right now are very cognizant of what is going on in the basic science of genomics. In maize, which is the species I know best; they are very aware of it; they have people in their organizations that were trained in genomics and bioinformatics, So they are staying abreast of the new activities that are coming out of the public sector and they are evaluating each one and asking whether it will help them develop the next greatest hybrid. So they are very interested in CNV and PAVs, and I suspect that what we've found will start being used by industry, probably in the near future.
One challenge facing scientists must be how to adopt all the new technology platforms that are coming out. How do you navigate your way through this exhibit hall of things you can do?
Identifying the most appropriate technologies is challenging. I attend a lot of meetings and talk to a lot of people. I also appreciate the fact that I've got great students and staff. I rely very much on them because they tend to be very open to new ideas.
Identifying the right technologies is especially challenging for the smaller labs that aren't focused on genomics. I went to a really interesting talk a few years ago by a guy who was studying how butterflies flap their wings. He was interested in which genes were expressed in the organ that moves the wing. This was clearly a question designed for next-generation sequencing of the transcriptome of those cells. The challenge for us as a community is to make sure that people like that, who have a real expertise in a particular biological question, can use the appropriate genomic technologies to move their science forward without having to become genomicists and bioinformaticians.
I think on some campuses that's happening. There are central user facilities that provide sequencing technology. And that's good. The domain expert can bring their sample over and have them sequenced and they will be given the sequence. But there also needs to be the analytical capacity to support this sort of central user facility. There need to be some bioinformatics staff that can get the data to the biologist in a format that will be familiar for them. Some campuses are working on doing that.
[ pagebreak ]
Have you been inspired by other projects? It's been said that the bovine community has set an example for other animal and plant-focused research communities.
The maize genetics community is definitely a trailblazer in terms of science and technology among the crops. It's a large, very vibrant community. Often, the maize community does something and then other crops recognize that and move in a similar direction. There is also cross-fertilization in that maize people will sometimes work in other crops, particularly other grasses, and bring their expertise, their worldview, to those other species. Given the close evolutionary genetic relationships among the grasses, that can be very productive. Those other species can in turn inform our own studies of maize.
I learn a lot from non-plant groups. As I mentioned we were inspired to tackle sequence capture by Dick McCombie’s paper. I was very pleased to have an abstract on CNVs and PAVs accepted for the [Advances in Genome Biology and Technology] meeting in Marco Island, [Fl.,] and I am looking forward to attending that meeting for the first time next month. I expect to learn a lot.
You are using CGH right now. Are there any other methods for gaining insight into what you are studying?
We have been doing a lot of 454 sequencing for about three or four years now. We have been adapting that. One of the papers that came out as a companion to the maize sequencing paper reports on a new technology developed by one of my students, Sanzhen Liu, that allowed us to amplify DNA next to transposons. There are some active high copy transposons in maize, and we wanted to know where they insert. Sanzhen came up with this really clever way to amplify the DNA next to a newly inserted transposon, even though we don't know the adjacent sequence. The 454 technology was great for that because we got fairly long reads, which allowed us to match the 10,000 to 20,000 distinct transposon insertion sites to the genome with high confidence, so we got some great biological insight out of that.
We are also doing a lot of [Illumina] sequencing of transcriptomes and [are] also using [Illumina] sequencing to look at methylation patterns across the genome. We are working closely with Srinivas Aluru, one of my colleagues here at Iowa State, who is a parallel computing expert, to develop software to for the assembly of complex genomes using next-generation sequencing technologies.
We're also working with Jinsheng Lai, a colleague at China Agricultural University, who, in collaboration with the Beijing Genome Institute, generated some great resequencing data from a variety of maize lines. This is allowing us to explore haplotype diversity. Related to that, though I don't have it yet, I'd be very excited to work with some [Pacific Biosciences] data. I I hope that will happen sometime this year.
You said that because of the success of plant breeders, so-called selective sweeps have been created, and losing genetic diversity might be a negative consequence of these programs. How can something like that situation be avoided?
I would preface it by saying that we don't know the extent of this problem. It certainly has happened, but we need to do more CGH experiments to determine how widespread this is.
Let me explain what a selective sweep is. Imagine we live in a simple world where there are two alleles: a favorable allele and a less favorable allele relative to stalk strength. Over multiple generations breeders select plants that don’t l fall over. By continuously selecting the stiff-stalked plants, the frequency of the favorable allele goes up in the population. But the allele frequencies of genes that are tightly linked to the stalk strength gene will also change. Alleles that are coupled with the favorable allele of the stalk strength gene will experience increases in allele frequency. That's not a desirable thing.
What we see is called a selective sweep. We have changed allelic diversity at a whole bunch of genes inadvertently and lost genetic diversity at these genes. These genes may have important functions that we will want to use in future breeding projects. So we have narrowed our options. To avoid that problem, it really makes sense to make use of molecular markers that are actually within the gene that we care about. Then, while selecting for the favorable allele of the stalk strength gene, [we'll] also make sure that we get crossovers on both sides of the selected gene so we break up haplotype blocks [and] we don't inadvertently change allele frequency of the other genes.