An international team of scientists led by the University of Alberta plans to use next-generation technology to sequence and assemble de novo the gene transcripts of 1,000 plants.
Funded with C$2 million ($1.63 million) from the government of Alberta, the scientists will initially seek to increase the number of plant species for which transcript sequence information is publicly available and to learn about their biology and evolutionary history. In later phases, the initiative might focus on commercial applications of the results. The Beijing Genomics Institute in Shenzhen will provide the sequencing for the project.
The idea for the project, called 1000 Plants Initiative took shape earlier this year, according to Gane Ka-Shu Wong, a professor in the departments of biological sciences and of medicine at the University of Alberta, who leads the initiative.
After the 1000 Genomes Project took off earlier this year, Wong and his colleagues in China were thinking about other projects to pursue. “We wanted something a little bit more affordable but yet with a similar impact,” recalled Wong, who is also a founder and director of BGI Shenzhen.
They came up with a project that would expand the knowledge of plant biodiversity: sequencing the transcriptomes of approximately 1,000 plant species in order to learn about their phylogeny, metabolism, and other aspects of their biology.
Funding for the C$2 million initiative comes from the Alberta government. The project is also supported by the Alberta Agricultural Research Institute, Genome Alberta, the University of Alberta, BGI, and Musea Ventures, a US-based venture-capital firm.
According to Wong, fewer than 100 plant genomes have been characterized by sequencing so far, even at the EST level, judging by data submitted to GenBank.
The idea of the initiative is to shotgun-sequence plant transcriptomes with new sequencing technologies and assemble the data with algorithms developed at BGI, according to Wong. “Think of it as a metagenomics assembly,” he said.
Transcriptome data will be faster and cheaper to obtain than whole-genome data. “Of course, over time, it’s going to be full genomes. But not in the next year or two,” he said.
“We can get an awful lot more transcripts than has been done before.”
Wong and his colleagues at BGI have already been “test-running” the technology and assembly algorithms on rice transcripts, a species for which a reference genome and cDNA sequences are available for comparison. The results are “not too bad” so far, yielding “more than an EST, [but] less than a full-length cDNA,” according to Wong.
“The assembly problem is a lot easier for transcripts because most of the repeats, particularly in plants, are not sitting in the protein-coding regions or the UTRs,” he explained. “Even the gene duplications don’t really hurt you all that badly because the genes are so diverged that they don’t screw up the assembly.”
Though he said the assembly results will be neither perfect nor complete, “we can get an awful lot more transcripts than has been done before.”
At a meeting in Vancouver, BC, in January, participants and potential collaborators will discuss details of the project, including which species to select. Interested parties hail from North America, Europe, and Asia, and include plant scientists and major botanical gardens. Among them is the iPlant Collaborative, which was funded with $50 million from the National Science Foundation earlier this year to create a “cyberinfrastructure center” for plant biology questions. The 1000 Plants Initiative is still open for additional participants.
Two foci of the project will be medicinal plants and algae, and endangered species will also be included. “The intent is to choose the 1,000 plants collaboratively,” according to Wong.
During its first phase, which will last approximately two years, the project will aim to sequence and analyze transcriptomes for approximately 1,000 species. Most likely, much of the data will be generated on BGI’s Illumina Genome Analyzers, although “we will use the technology that makes most sense,” Wong said.
All data will be deposited in GenBank, and participants will be required to agree to an open-access policy. The results from this effort will likely yield new insights into phylogenetic relationships, metabolic pathways, and other aspects of plant biology, according to Wong.
The second phase of the project is less-well defined, he said, but the hope is that it will lead to commercial efforts.
But first, the researchers need to “prove to ourselves and the rest of the world that this is useful, even at this transcript level of incompleteness and imperfect data,” he added.