Name: Susan McCouch
Title: Professor, Plant Breeding and Genetics, Cornell University
Background: 1995-present, professor of plant breeding and genetics and plant biology, Cornell University, Ithaca, NY; 1990-1994, associate geneticist, International Rice Research Institute, Los Baños, the Philippines
Education: 1990 — PhD, plant breeding and genetics, Cornell University; 1982, MSc, University of Massachusetts, Amherst, Mass.
Years of hard work are starting to pay off for rice researcher Susan McCouch. The Cornell University professor recently partnered with Affymetrix to construct the GeneChip Rice 44K SNP-genotyping array, a 44,000-marker chip that McCouch hopes will help researchers identify rice varietals around the world and, ultimately, breed better crops.
McCouch and other researchers have been working on developing a high-density genotyping array for rice for five years, and, according to McCouch, rice breeders will now be able to use the 44K chip to classify their germplasm and track fingerprint varieties for improved seed management and IP protection.
Service providers Expression Analysis and DNA Landmarks have already begun offering the new rice chip, and McCouch is working with Affy to identify service providers in large rice-producing countries in Asia.
At the same time, McCouch is using other technologies, such as Illumina custom arrays and BeadXpress panels, to help her make breeding decisions: To date, she has sequenced more than 100 rice genomes in an effort to build an even higher-density rice-genotyping chip.
Last week at the Plant and Animal Genome Conference in San Diego, and again this week, BioArray News spoke with McCouch about the ways the rice community is adopting arrays and sequencing for its research.
The following is an edited transcript of those interviews.
Why did it take so long to develop a genotyping array for rice?
The primary reason the genotyping array for rice was not available before now was the lack of genome-wide sequence information on diverse varieties that is necessary to build a reliable genotyping chip.
We originally submitted a grant to the National Science Foundation in 2006 with the objective of building a genotyping array for association mapping in rice and the grant started in 2007, but at that time, there was no SNP discovery data available to build such an array. In mid-2008, we were granted prepublication access to a SNP discovery pool that was being generated by a group of researchers at the International Rice Research Institute (IRRI) in the Philippines and in US universities using Perlegen Sciences technology to resequence 20 diverse rice genomes. That project was collaboratively funded by the US Department of Agriculture and IRRI, and it was funded with the understanding that the data would be used to develop either genotyping arrays for applications for breeding as well as for other kinds of analysis. We were given early access to that data in order to build the 44K array. Interestingly, there were only 160,000 SNPs discovered in the whole project, and if you ran a first analysis to identify tag SNPs, there were only 31,000 tag SNPs. Because we wanted to build a 44K chip, we had to access other sources of sequence information, which existed in the form of bacterial artificial chromosome end-sequences from another NSF-funded project, the OryzaMap project. So I guess the answer to your question is, there wasn't a pool of resequencing information that was large and diverse enough to permit the development of a genotyping array previously.
We had two very good reference sequences for rice, the [the japonica variety] Nipponbare genome and the [indica variety] 93-11 genome, but you couldn’t build a robust genotyping array based on the sequences of two genomes. But the availability of the reference sequences meant that it was relatively easy to resequence additional genomes and align to the reference sequences. Rice was the first crop to have a reference genome, and we were among the first to generate resequencing information, and now we are building all kinds of chips and other genotyping assays.
Where does sequencing fit into this mix?
Some people are leapfrogging right over fixed array technologies. Many people are leaping into genotyping by sequencing, and I think they are discovering some significant informatics challenges as they try to make sense of the data. The rice community is lucky because we have high-quality arrays to anchor our genomes and there's a sense that the informatics is easier when you have fixed arrays. Large diversity data sets can be readily generated using these arrays. A lot of people are starting to use the 44K chip because it is really fast, it is really high quality, it is efficient, and the data management is simple and straightforward. When you do any kind of resequencing you have to assemble short reads, you have to think about repetitive regions, misalignments, gaps. In contrast, a well-designed genotyping array has an ease of use and guarantee of quality that can be counted on.
I don't know how to compare our user community to others, but if I just compare the rice community to the maize community, there is a big difference because the vast majority of maize breeders work in the private sector, and the differential in terms of access to financial resources, information management resources, and the number of people trained in informatics and computational science, is enormous. In the global rice community, we are mostly dealing with breeders and geneticists working in the public sector, many living and working in the developing world, and many severely under-resourced. This is not a corporate crop, it is not primarily bred as a hybrid, and there is little money to be made. Rice primarily feeds the poor and everything about the way it is bred and distributed slants the rice community away from the ability to jump out and use fancy computational resources. So we wanted to develop something that was friendly to the user, especially toward the part of the scientific community that would have the least resources and is trying to develop rice varieties to feed the poorest people in the world. We did not want to bias our genotyping platform to favor groups that are well funded with lots of resources. I think the genotyping array meets those criteria.
[ pagebreak ]
People are happy because there are service providers who can process the genotyping arrays. They don't have to invest in the infrastructure or know anything about the workings of the technology. They send DNA or more simply they send tissue and they get back genotypes and with existing analysis platforms, they can use the data immediately.
Where do you see this project going?
This chip is our phase one. We developed it in order to validate the SNP calling on the resequencing dataset and to try out the custom-array technology and see how efficient it would be for users, and also because we had to bide time until we had a large enough SNP discovery pool to build anything bigger. So it is phase one in the sense that we have met all the design criteria, and we have demonstrated that it is a very high-quality product that is easy to use. We even developed our own DNA prep for rice that does not require enzyme digestion and complexity reduction, and our own allele-calling algorithm that optimizes the ability to call SNPs reliably on small sample sizes and in collections of homozygous samples, which is characteristic of many crop species. As we release the 44K chip, we are working with commercial service providers, trying to iron out all of the little kinks that might come up so it is smooth sailing for users. There is a bit of tweaking around the edges to make sure the process of sample submission and data delivery goes smoothly so the clients really like it.
So there is a second chip that you have designed?
Yes, but it is just on the horizon at this point. There is momentum building in the rice community for diverse approaches to genotyping and I think the launch of this product right now is really timely because the community is excited about what genotyping can do and the resolution provided by the 44K chip fills a particular niche for the rice community.
The question is, 'Which of the many genotyping options do people want to pursue for different objectives?' If they have many options, as a community, we hope people will carefully consider which options are best for each application, and that they will be willing to try this one. If it is informative and easy to use and reliable and they don't have a lot of hidden costs, they will be back.
Do you think users will wait for the second, higher-density array rather than adopting the 44K?
No. The reality is that the rice community is probably one of the best endowed in terms of the array of different SNP platforms that have been developed for it and to empower it to do custom designs. I don’t think there is another plant community that has as many options as the rice community has now.
Do you have a shared database where you can deposit the information from using the 44K array?
Information about the array will be publicly available as supplemental information accompanying our publication on the array, and diversity data on rice that we generate as part of our NSF project will be completely public as well. Information about every SNP on the chip will be documented as an annotation track in the Gramene database and the rice diversity data will also be hosted on Gramene as soon as our paper is accepted for publication.
Have you used other technology platforms?
We actually built smaller custom Illumina chips before we built the 44K Affy chip. Several of the 384-SNP assays were designed specifically for IRRI researchers for a variety of breeding applications. We have designed low-, medium-, and high-density assays to ensure that different user needs were met. We are also doing quite a bit of resequencing ourselves.
In collaboration with IRRI and USDA researchers, we designed seven 384-SNP assays that are now in use all over the world using Illumina’s BeadXpress system. We coordinate an international Rice SNP Consortium that has resequenced 150 rice genomes to date using the GAIIx platform, and that serves as our SNP discovery pool for building the million-marker SNP chip. I am not so concerned about which technology or platform is used. For me, the most important thing is which one does the job.
What have you done right that could inform other researchers who are interested in putting together this kind of array for the crops they study?
The main thing we did was to have very reliable, high-quality sequence data going in. Rigorous bioinformatics is critical before you design a fixed array. If you are building a custom array, your design features are all your own, and having really good informatics means a really good product will be built. I keep telling people to do their homework upfront. Good data in, good data out.
The most frequent reason for failure of any SNP on a chip is that there are unknown indels or SNPs in the flanking sequences, and so one or both alleles fail to hybridize. When that happens, it comes up as a failed SNP, and because you are trying to interrogate unknown samples, you end up not gaining information. The best way to avoid this is to do a lot of SNP discovery early on, so you know what the probability of SNPs [is] in these flanking regions and you can target SNPs in the regions that are likely to succeed.
That's why we pushed our informatics inquiry to the maximum and we learned a lot by building this 44K chip. We also did a lot of posterior analysis to determine which SNPs failed and why, and what we learned helped us design a second chip that is a lot more sophisticated than this one. But this one is very good value. It met or surpassed all of our quality standards and stands as a very high-quality genotyping array for examining diversity in Asian rice, Oryza sativa.
[ pagebreak ]
Is anyone in the rice community skeptical of array technology?
Only the people who have had a bad experience with a product because it was badly designed are skeptical. Some of these custom products are badly designed and the community tends to blame the technology first.
How long have you been using the technology?
Two years. And we all know the microarrays from the RNA side because we were using whole-genome rice GeneChips before. Using genotyping arrays is a different process. If you only put probes in genes, then they are in conserved regions, and you are more likely to have success using fixed arrays, which is why everyone has very good quality RNA chips. With genotyping chips, the variation in non-genic regions is constantly surprising you. If you only target the exonic portions of genes, which are conserved, you can get away without knowing as much up front about variation in the gene pool as a whole. If you also want to target non-genic regions of the genome, and probably 80 percent of the variation on our 44K chip is non-genic, you have to know a lot more about those flanking regions because there is no selection to keep them from being variable.
Some communities haven't had enough a priori information to build a good chip, and then they get disappointed in what they get out. Those communities may not have had the SNP discovery pool they needed at the beginning, or they may have lacked the expertise to design a high-quality array. One of the things I have learned is that we need to train more people at the interface of biology and informatics because the number of people who have the informatics skills is few and far between. We have trouble recruiting them. So I worry about the enthusiasm for resequencing-based approaches because I’m not sure people will be able to handle that much data effectively.
Do you run your arrays at Cornell or are you outsourcing it to another lab?
Most of the work we have done so far on our project has been done at Cornell. Now we are starting to outsource because as a university, we are not set up to be a service center. When it was part of my research, we did it in house, so all of the 600 plus samples we have run so far is based on what we have done at the BioResource Center at Cornell.
More recently, we have done some beta testing with our service providers. We have provided them with the DNA prep protocols and SNP-calling algorithm that we have developed and we have trialed each step and everything is working.
Your research effort is international, but the two service providers announced so far are in North America. How do you intend to reach users in other countries?
That is an interesting and important story. Rice is primarily an Asian crop. Ninety percent of the world's rice is produced in Asia. We clearly need service providers in Asia to make it a little bit easier. We are reaching out to service providers in Singapore, Hong Kong, India, China, and Japan. We are particularly interested in setting up reliable service providers in countries like India and China because they don’t allow their native rice germplasm to be sent out of the country, so researchers in those countries cannot use the chip we have designed unless they can do so inside their country.
China is on the resequencing road and I don’t know how interested researchers in China will be to use the fixed array; certainly we cannot compete with the Beijing Genome Institute for resequencing. There are also countries like Thailand that are major rice producers and exporters who might want to use it, but again, it will have to be within Thailand because of the restrictions on exportation of rice germplasm.
Rice is unusual in this regard. It represents food security and natural resource security for three of the most populated countries in the world — China, India, and Indonesia — all of which have enormous pressure to increase food production. These genotyping arrays, and the higher-resolution options on the horizon, represent an obvious starting point for genetically characterizing the diversity of gene bank collections, which would make those collections of seeds much more useful for plant breeding. As more and more people use the arrays, the diversity of different rice strains can be compared, and people will be able to determine if what they have in their own country is really different than what others have, and to think about how to utilize the variation that is at their fingertips.
The 44K chip and other genotyping technologies are part of a larger perspective on enhancing our understanding of the diversity that breeders have to work with, and making that diversity more transparent. One of the major challenges in rice research is to utilize water more efficiently, shifting away from the irrigated paddy system because fresh water is becoming limited, dealing with climate change, which is shifting rainfall patterns and disease pressure, and trying to use natural variation in new ways to address these problems of productivity and sustainability.
How are the breeders responding to this new technology?
Most plant breeders are very excited about this new technology for genotyping. They are interested because while they know which varieties perform well in a given environment, they often don't know why. When they make crosses, breeders recombine alleles and select the best performing offspring. Having an idea of the kind of recombinant offspring they want to create, even before they achieve it, would allow them to use SNPs to select the best performing plants and increase the efficiency of the breeding process. Genomics-assisted breeding will drive the field toward a more hypothesis-driven [approach] rather than a black-box approach, and will dramatically alter the way breeding programs are managed.
I like the idea that we are releasing the 44K chip at this time, because it is an incremental move for the rice community. Breeders are used to using 384-well SNP assays, but most can't go from 384 SNPs to resequencing in one leap. There is no capacity to absorb that much information. And so this is a perfect intermediate.
Is it true that you are the number one importer of exotic rice after the USDA?
That’s right. After the USDA, McCouch is on the map for importing exotic rice into the US. People ask, "How come there is so much exotic rice in upstate New York?" Of course, we don’t grow it in the field up here. We just grow it under glass and do a lot of genetics.