Skip to main content
Premium Trial:

Request an Annual Quote

En Route to 1K Reference Genomes, BGI Completes Sequencing and Assembly of 95 Species

Premium

By Monica Heger

One year after launching its 1,000 Plant and Animal Reference Genomes project, China's BGI said this week that 505 species are either completed or in progress, including 152 plants and 353 animals. Of those 505 species, sequencing and assembly has been completed for 95 species, and the institute expects to have completed 200 species by the end of the year.

The project, announced in January 2010, aims to sequence the genomes of 1,000 plant and animal species (IS 1/12/2010). BGI said that it has chosen the species based on their scientific and economic importance, and is still in the process of selecting which ones to sequence and is also looking for collaborators.

So far, completed genomes include the wild soybean, cacao, orangutan, and honey bee. Details on the proposed species and the status of each genome are available on the institute's website. The site also includes information on genomes being sequenced under the Genome 10K project, an effort led by the University of California, Santa Cruz, that aims to sequence and analyze the genomes of at least 10,000 vertebrate species (IS 11/10/2009). BGI is coordinating its activities with the G10K project.

Illumina is the primary platform being used for the sequencing, said Joyce Peng, marketing director at BGI Americas, but "we are open to all new technologies."

BGI is doing de novo shotgun sequencing for each species, and has a slightly different strategy depending on whether the genome is common or more complex.

Common genomes are defined as being either haploid or homozygous diploid, with a GC content between 35 percent and 65 percent, repeat content comprising less than half the genome, and a heterozygosity rate of less than 0.5 percent.

For these genomes, BGI is constructing multiple paired-end sequencing libraries with insert sizes of 200 base pairs, 500 base pairs, 800 base pairs, 2 kilobases, 5 kilobases, 10 kilobases, and 20 kilobases.

It is sequencing to a total depth of over 60-fold and using its SOAPdenovo algorithm for assembly. For the assemblies, it is aiming to achieve more than 95 percent coverage of the euchromatic region and over 98 percent coverage of the gene region, as well as an N50 contig size of more than 20 kilobases and an N50 scaffold size of over 300 kilobases.

For complex genomes, which include polyploid genomes and those with many repetitive sequences, abnormal GC content (either below 35 percent or above 65 percent), or a heterozygosity rate greater than 0.5 percent, BGI is employing BAC or fosmid sequencing along with next-gen sequencing.

For these genomes, BGI still constructs libraries of different insert sizes and sequences to 50-fold coverage using next-gen platforms, but it also adds BAC or fosmid sequencing. For those libraries, the BGI researchers are also doing shotgun sequencing, such that each clone has 50-fold coverage.

Complex genomes take about one year to sequence and assemble — about twice as long as common genomes, said Peng.

While BGI last year said it would collaborate with OpGen on its optical mapping technology for de novo sequence finishing of plant, animal, and microbial genomes (IS 11/9/2010), Peng said that the researchers are not using the company's technology for this particular project, and that it is mainly being used for microbiology projects. She added that BGI has used it for one animal genome, but has not yet obtained the results.

Many of the species are being sequenced as part of larger collaborations. For example, the cacao genome was an international collaboration involving researchers in France, the US, Venezuela, Brazil, Korea, and the Ivory Coast.

BGI has said it will contribute $100 million to the 1,000 Plant and Animal Reference Genomes project, with collaborators contributing additional funding. It is still seeking collaborators for some projects, and while it does not have a set timeline for the completion of the project, it expects to have 200 species sequenced and assembled by the end of the year.


Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.