BGI is working with the Chinese government to develop a national database to store genomic information it is generating through its own sequencing efforts as well as that of other local and international research groups.
China’s National Development and Reform Committee is creating the database in collaboration with BGI. The partners expect the resource to be "one of the world's largest gene banks," Yang Huanming, BGI’s president, said in a statement.
Qi Chengyuan, chief of the NDRC, said the new national gene bank will help China "better protect, research, and utilize its precious genetic resources, boosting the genetics industry and safeguarding the country's genetic information."
He further stated that the resource is based on data and facilities belonging to the BGI, but is expected to grow with the help of “extensive cooperation with other biological organizations both at home and abroad.”
BGI is currently the world's largest genome center and houses more than 130 Illumina HiSeqs, around 30 Life Technologies SOLiDs, and more than 100 Sanger sequencers. The institute, which has facilities in Shenzhen and Hong Kong, generated about 500 terabytes of next-generation sequence data last year. Its current output is around 5.6 TB of data a day, a spokesperson said.
Yong Zhang, the project director of the National GeneBank at BGI, told BioInform via e-mail that the massive output of the institute was one reason that China decided to build its own database.
The decision was also prompted by the fact that the US National Center for Biotechnology Information announced earlier this year that it would phase out its Sequence Read Archive, a BGI spokesperson said, though NCBI has since decided to continue to support the SRA (BI 6/17/11).
BGI will continue to submit data to NCBI, the European Bioinformatics Institute, and other databases for use in research publications. The institute is also “working very closely with the International Nucleotide Sequence Database Collaboration and other parties” to determine whether the resource will be a part of the consortium as well as how it will be operated, Zhang said.
INSDC comprises the DNA Data Bank of Japan, GenBank, and the EBI’s EMBL Nucleotide Sequence Database.
BGI is currently developing the infrastructure to house the data. The institute is partnering with an unnamed storage solution provider but is also on the lookout for additional partners, Zhang said.
The resource is expected to require petabytes of storage for the genetic data, though it's likely it will need to expand to the exabyte scale in the future.
While the resource will be used to house data from BGI primarily, it will also accept data from other research groups in China and in the international scientific community.
BGI isn’t placing any restrictions on the data that can be submitted, Zhang said, though only "high-quality" data will be accepted.
As far as access to the data, Zhang said that there will be no "specific restrictions, but it highly depends on the data owners." Data submitted as part of a journal publication "will follow the common rules," while access to other types of data "will be further discussed."
BGI also plans to launch a scientific journal called GigaScience that will allow authors to submit their "big data” alongside research findings. The institute will provide additional details at the Bio-IT Asia Pacific conference it is co-organizing, which will be held on July 7 in Shenzhen.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.