A working draft of the genome of the rice subspecies indica is now available to the public thanks to the efforts of the Genomics and Bioinformatics Center of the Beijing Genomics Institute at the Chinese Academy of Sciences.
Matthew Huang, deputy director of BGI, said the success of the project rests largely on the shoulders of the 100-strong bioinformatics team at BGI, which developed three new algorithms to meet the unique assembly and annotation demands of the rice genome.
Huang noted that the support of Sun Microsystems was also crucial to BGI’s accomplishment. The genome assembly was conducted on a Sun Enterprise 10000 server, while BGI’s status as a Sun Center of Excellence in Genomics brought additional support for the development of the algorithms that made it all possible.
“We try to support our partners through various ways,” said Stefan Unger, business development manager for computational biology in Sun’s global education and research group. “We make sure they can use our equipment well and we make sure that we develop a community of users that can talk to each other and share their experiences. That’s the purpose of the Center of Excellence program.”
The accomplishment was lauded by the scientific community. “Our Chinese colleagues have given the world a wonderful gift by deriving a highly useful draft of the instruction book for this incredibly important crop species,” said Francis Collins, director of the National Human Genome Research Institute.
“The public availability of rice genome sequence will have an immediate and salutary effect on the scientific community,” added Eric Lander, director of BGI’s sister center, the Whitehead Institute Center for Genome Research.
The draft sequence data (4X coverage with 95 percent of the coding region identified) is publicly available at http://btn.genomics.org. cn/rice.
The Beijing rice sequencing project began in May 2000. The success of the effort is evident when compared to the current status of the International Rice Genome Sequencing Project, which began in 1998 and is not expected to be complete for several more years.
Huang said the “real sequencing effort took off in July 2001,” with the main sequencing requiring around three months’ work, a few weeks for the assembly, and around another two months for annotation.
Huang noted that while the IRGSP is taking a chromosome-by-chromosome, clone-by-clone approach that distributes the sequencing tasks among its 10 participating sequencing centers, the BGI team used a whole-genome shotgun approach. This sped up the process considerably — once they dealt with the considerable assembly challenges that whole-genome sequencing presents.
Putting the Pieces Together
The BGI developed three new algorithms to tackle the demands of whole-genome sequencing. The first, a repeat-masking algorithm, “was essential in our successful assembly of the rice genome scaffolding,” said Huang. In addition, the team made improvements on the Phrap assembly algorithm and wrote a specialized gene-finding algorithm.
“The current gene-finding algorithms like Grail and Genscan are good for mammalian gene-finding, but they’re less effective for plants,” said Huang. BGI tailored the new algorithm to the unique G-C content gradient of the rice genome to help it find exons and introns.
The BGI gene-finder identified a surprisingly large number of genes. While the rice genome is one-seventh the size of the human genome at 430 megabases, it has twice as many genes ¯ 60,000. This may be due to the lack of alternative splicing in plant genomes, but Huang said the BGI identified more alternative splicing events in rice than initially expected. The group is now validating this experimentally.
BGI verified its assembly with data from the Chinese Academy of Science, which is participating in the international rice sequencing effort. “We compared our scaffolds with their assembled BAC sequence and the results are consistent,” said Huang. BGI also verified its gene predictions against publicly available data.
Two Strains are Better than One
The first public draft sequence of the world’s largest cereal crop is certainly worth celebrating, and the BGI’s selection of the indica subspecies should prove especially useful, according to Huang.
While the IRGSP and the two private rice sequencing groups at Monsanto and Syngenta are sequencing the japonica strain, Huang said the indica subspecies is older and “there are more indica planted in the world than japonica.” In addition, the indica subspecies is the paternal cultivar of a Chinese hybrid rice that has a yield per hectare 20 percent to 30 percent higher than the average of other rice crops. Hopefully, careful study of the indica genome will help researchers determine the source of this high yield.
In addition, Huang said the two rice strains are of particular biological interest because in an evolutionary sense, “they’re on the verge of reproduction isolation.” The indica genome is much larger than the 380 megabases of japonica, yet their coding regions are almost identical. “Having the data on both will be an enormous advantage for us to understand rice genetics and biology,” said Huang.