By Monica Heger
BGI has recently expanded its collaborations in the agricultural space, targeting projects to sequence species important for food and industrial use, as well as endangered species and evolutionarily important species.
At this month's International Conference on Genomics in Shenzhen, China, the institute highlighted some of its recent collaborations and progress in the field. In a presentation, Wang Jun, BGI's executive director, said that the goal is to sequence one million plant and animal genomes.
The first part of the project is to sequence reference genomes, of which BGI has already completed the sequencing and de novo assembly of 388 animal genomes and 152 plant genomes, Wang said in his presentation.
The second part is to sequence additional genomes for each species to gain a better understanding of how genetic variation is important for traits like drought resistance in food crops, for instance. BGI has sequenced just over 25,000 of these so-called variation genomes.
"One genome doesn't tell you much," Wang told In Sequence, adding that knowledge about the diversity of phenotypes and genotypes is much more useful.
A better understanding of the diversity of variation will be important for things like crop breeding, to produce strains with pest-resistant or drought-tolerant genes, for example. Understanding the genetics of those plants, and which strains contain the desirable genes, will enable better breeding.
BGI is also focusing on endangered species such as the panda, said Wang. BGI has sequenced a "few percent" of the entire panda population, estimated to be between 1,500 and 3,000, in order to understand the genetic diversity of the animal, which is important for conservation biology.
Additionally, as part of its ag-bio strategy, BGI is forming partnerships with academic institutions with specialties in specific areas. For instance, at the conference, the institute announced a collaboration with Oregon State University to sequence the genomes and transcriptomes of Phytophthora plant pathogens; as well as an expanded partnership with the International Rice Research Institute to sequence 10,000 rice varieties. Earlier in November, BGI and the International Maize and Wheat Improvement Center in Mexico agreed to collaborate on the sequencing of maize and wheat.
Brett Tyler, who is currently at the Virginia Bioinformatics Institute but will soon be director of OSU's Center for Genome Research and Biocomputing, provided In Sequence with additional details of the Phytophthora project, which aims to sequence all known species in the genus. Currently, there are around 120 known species, Tyler said, but new ones are discovered every year, so the final project could include the sequencing of 150 species.
The initial pilot project with BGI will include the whole-genome sequencing of 27 species, as well as transcriptome sequencing of those species at two different stages of development. Nine of those species will be from Tyler's group at OSU, while other collaborators that are part of the Phytophthora Genus Sequencing Consortium will contribute the additional 18 species.
BGI will do all of the sequencing, assembly, and annotation, and Tyler said the pilot could be completed within the next six months.
For the larger-scale project, the consortium is seeking funding to sequence and assemble the remaining 130 species.
To sequence the pathogen, BGI will use paired-end sequencing of multiple libraries with varying insert sizes, including sizes of 100 base pairs, 350 bp, 800 bp, 2 kilobases, 5 kb, 10 kb, 20 kb, and 40 kb. Assembly will be done using BGI's SOAPdenovo algorithm.
The reason for constructing so many different libraries is that the Phytophthora genome contains a large number of transposable elements and repetitive regions. There are many copies of virulence genes, and assembling them correctly is a major hurdle, Tyler said.
BGI's strategy of constructing libraries with many different insert sizes is a "unique strategy" that "we think will be helpful for teasing apart these complicated regions," he said.
Because the transposons and virulence genes are so repetitive, in order to figure out exactly where in the assembly they lie one needs to have one paired-end read that lies within the transposon or gene, and a second read that lies within a less repetitive region of the genome, "providing an anchor," Tyler explained.
Because the different transposons and genes all have different lengths, and lie within repetitive regions of varying lengths, constructing libraries with many different insert sizes "gives you the best chance for anchoring each gene to a unique sequence," Tyler said.
Sequencing will be done to between 50- and 100-fold coverage, which will also help assemble those regions, he added.
Following the pilot, the team will re-evaluate the sequencing strategy to determine the optimum number of libraries to construct, the correct insert sizes, and the fold coverage for the larger project to sequence every strain in the genus.
For the pilot, the team chose species that are closely related to species for which good quality genomes already exist, so that the new assemblies can be compared with those assemblies.
Currently, three different species have a fully assembled genome, including Phytophthora sojae, the species that causes soybean root rot; Phytophthora infestans, which causes potato blight and was the pathogen responsible for the potato famine in the 1840s in Ireland; and Phytophthora ramorum, which causes sudden oak death and has been particularly destructive in California.
Sequencing of these genomes was done using shotgun sequencing of fosmid and plasmid libraries, as well as by constructing physical maps using BAC clones.
Tyler said the group will be "using what we know about those genomes to help interpret the new genomes."
The project aims to sequence every species in the genus because the genes involved in virulence change very rapidly. The pathogens are extremely adaptable, Tyler said, so the "virulence genes that the new species use will be quite different" from those of the species that have already been sequenced, even though they are very closely related.
Because the pathogens change so quickly, identifying which genes cause virulence will be tricky, since just identifying the genes that have been conserved will not necessarily pinpoint the virulence genes.
To help identify the pathogenic genes and transposons, the team is sequencing each species at two different stages of development. At one stage, the organism has the ability to grow by itself on dead plant matter, and it uses a different set of genes to survive in this stage than when it is infecting living tissue, Tyler said. By comparing the transcriptomes of those two stages, the researchers will be able to pinpoint the genes that are necessary for infection and that affect the host's immunity. "Those are the genes we're particularly interested in," he said.
The overall goal is to eventually understand enough about the pathogens to devise effective ways to prevent infection and also to detect infection early on, Tyler said.
While the virulence genes are constantly evolving, which will make developing a common prevention strategy difficult, "we are hoping to identify a core set of virulence genes or a conserved set of motifs within the virulence genes that we can target," Tyler said.
"By having a full picture of the genes that are evolving, we have a much better sense of what's essential for the pathogen, and what's dispensable from the pathogen's perspective," he said.
The diseases that are caused by these pathogens lead to tens of billions of dollars in lost crops in the US each year and are difficult to control with chemicals. The ultimate goal will be to develop crops that are resistant to the pathogen.
Aside from OSU and BGI, other partners participating in the consortium are the US Department of Agriculture's Agricultural Research Service; the University of British Columbia; the University of California, Riverside; Pennsylvania State University; the James Hutton Institute; Wageningen University; North Carolina State University; Nanjing Agricultural University; and the UK's Forest Research Agency.
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.