Name: Erik Legg
Title: Group Leader of Omics Application and Research, Syngenta Biotechnology, since 2006
Experience: Global Head of Molecular Breeding for Vegetables, Syngenta Seeds, Toulouse, France, 2002 – 2006
Molecular Markers and Cell Biology Labs Manager, Novartis, 1998 – 2002
Marker Assisted Breeding Scientist, Amycel, 1993 - 1998
Education: MS in plant genetics, UC Davis
In recent years, agricultural companies have begun to implement next-gen sequencing in their research with the goal of improving crop yields, and determining genetic variation responsible for traits such as drought and pest resistance.
For example, Bayer CropScience has sequenced the rapeseed genome in collaboration with BGI, Monsanto is working on sequencing the cotton genome, and Syngenta is contributing to the International Rice Genome Sequencing Consortium to produce a finished draft of the rice genome.
Recently, In Sequence spoke with Syngenta's head of Omics Application and Research, Erik Legg, about how the ag-bio giant is making use of sequencing technology, the types of projects it is doing, and how the technology is advancing plant genomics.
What types of sequencing projects are you doing at Syngenta?
Our sequencing activities at Syngenta kicked off with a fairly big project — transcriptome sequencing of a large maize panel. We started that a couple of years ago, and that's been the springboard for other activities across many other species. We've broadened to whole-genome sequencing and are doing a tremendous amount of resequencing for SNP discovery, and to support crop improvement and mode of action research in other species. We have also done some work in methylome sequencing, so bisulfite sequencing.
What sequencing platforms are you using?
We are fairly agnostic except for the fact that we believe much more strongly in the [Illumina] platform than the others. The vast majority of our data, more than 90 percent, is coming from Illumina.
We use other techniques for specific applications. The other platforms — [Life Technologies'] SOLiD and [Roche's] 454 — do some things particularly well, better than [Illumina], and we use those in those instances. Initially, for some of the de novo sequencing, we were using 454 quite a bit to take advantage of longer reads. We have used the SOLiD platform to take advantage of its multiplexing capability. And we hope someday to use the much-promised PacBio for methylome work.
[ pagebreak ]
Are you doing the sequencing in house or elsewhere?
Our model for data generation has been very much a mixed model. We have a strategic collaboration with [the National Center for Genome Research], which does a large part of our sequencing. And we have machines internally as well. The internal machines are all Illumina.
How many do you have?
Let's just say more than enough to keep our friends at [bioinformatics partner] GenomeQuest really occupied.
And are those all the Illumina Genome Analyzers?
Yes, we are still on the GAs and we are shopping for HiSeqs. We plan to migrate over to HiSeq.
Are you looking at other platforms?
We're always looking. We try to be flexible and consider all the options and put those together in an agnostic manner. We're just not seeing anything else right now that's going to displace the Solexa chemistry.
We'd really like to figure in PacBio. When their reality catches up with their marketing we'll be ready for it.
Some have argued that for plant genomes, whole-genome sequencing is best done with the longer reads of 454 because plant genomes tend to have many repetitive regions and are multiploid. But you've not found that to be the case?
We've experienced some of that as well. It depends on the species, of course, because their composition varies. But the way I would spend my money is on variations on paired-end reads. You can almost reach the equivalent of a 454 read if you play the paired ends right. So, again, we have used 454 to do gap filling. But it's the GA II that's driving the sequencing projects, and the 454 is an accessory.
Is next-gen sequencing becoming a trend in the agricultural industry? Are other companies investing in it?
It's beyond a trend; it's the reality. All the major agriculture companies are very heavily invested and very competitive in this space. Monsanto has equity stake in PacBio, for example. Certainly some of that is an investment, but some of that is about having access [to their technology].
So, I would say, the project that kicked off next-gen sequencing for us — this transcriptome analysis across a large panel of maize — was really a sea change for us and a sea change for the agriculture industry in terms of really drilling down into what makes genetic diversity tick, and how we could leverage that to make better products. That was huge for us in terms of generating internal business interest in going farther and faster with this technology.
Why was that project such a shift?
Part of it stems from Syngenta's history as a leader in microarray technology for gene expression. Syngenta was the first plant company to have an internal Affymetrix platform, and I'm fairly certain we have the largest collection of Affymetrix gene expression data in the plant industry. So, for us internally, jumping off a platform that we're very heavily invested in and into next-gen sequencing was a real leap of faith for the company. And it bore fruit, which then made it quite easy for other crops to transition to the new platform.
The justification for the platform was based on three legs. One, it was a higher-performing expression platform. The second was the need to do de novo sequencing in crops and organisms that were not being handled in the public domain. And the third was resequencing for SNP discovery across many different crops.
It's a rare platform that can prove itself to be robust across three major activities like that. So the flexibility made it much easier to make the original investment and the culture change that went with it.
[ pagebreak ]
Are you mainly doing transcriptome sequencing projects?
The slight majority of our throughput right now is resequencing, and that relates to still wanting to understand the genetic diversity in the crops that we're working with and how better to leverage that for creating new varieties and better products. And another aspect of that is evaluating a transition to routine whole-genome sequencing.
We're using [the Illumina] machine to execute different types of techniques for different crops, including microbes and insects as well. The choice of what projects we do is driven by a tight interaction between the business objectives and the technical capabilities.
There's a good bit of discovery going on [with] this platform, but there's very little activity that isn't lined up with a specific business objective in mind. There's not a lot of prospecting here; it's a very targeted activity.
Virtually all the research is aimed at understanding how important traits are manifested by plants in certain environments. So one of the great things about this platform is you can go genome wide, you can go targeted in a very deep fashion, and you can look at different developmental stages and different tissues. And the platform is able to work in all those situations.
How many species of plants, insects, and microbes have you sequenced?
We've sequenced 25 plant species, and that's a mixture of species that have been sequenced in the public and internally. There are quite a few crops where we have generated a reference genome through a consortium. And [for] probably a third of those we've done our own de novo sequencing.
[As for] the insects and microbes, the numbers for the microbes are in the tens and in the insects it's fewer — I would say several. And the reason that those numbers are small is that the uptake has been much more recent. So I'm expecting that those will grow over the coming year or two.
What have been some of the hurdles and challenges in implementing sequencing?
Well, you have to say data. Everyone's drowning in data. Beyond just the data, [it's] the really rapid evolution of the technology, the constant change in this space. So those two things [have] really driven us to the mixed model approach for the data generation and also a mixed model approach for data analysis and data storage. [Collaborating] with NCGR and with GenomeQuest has allowed us to share responsibility for keeping up with the baseline trends and at the same time to focus more of our internal activity on methods development and data analysis that are focused on the problems Syngenta is trying to solve.
How has bringing in next-gen sequencing changed or advanced the type of research that you do?
Dramatically. At the simplest end, it has really extended the reach of a wave that was already happening in traditional breeding. The marker-assisted breeding space has become a much more aggressively applied tool. I would say in the comparative genomic space, the dramatically falling cost of sequence has allowed us to apply genome-wide treatment to crops and species that three years ago we were not talking about aggressive research in.
For example, we now have whole-genome sequences for crops like watermelon, cucumber, and melon. If a decade ago, you had asked someone when those genomes are going to be sequenced, they probably would have said 2050.
So it has really accelerated a trickle down from Arabidopsis-type genome research into secondary and tertiary crops. Which, for a company like Syngenta that's selling almost a hundred different species, that's really revolutionary.