China this week officially kicked off its participation in the Human Variome Project — an international effort to create a centralized catalogue of all human disease variants.
At an HVP meeting held in Beijing this week, the Chinese government launched the Human Variome Project Chinese Node, which has committed to providing 25 percent of the effort required to complete the resource.
China has earmarked $300 million, to be disbursed over 10 years, to support the node's collection and curation activities.
China's variant database will be housed at Zhejiang University in Hangzhou, with an administrative office in Beijing that will handle data collection and other tasks, Ming Qi, the director of the University's Center for Genetic and Genomic Medicine, told BioInform.
The database will include information on the properties of variants such as whether they are the result of a nucleotide change, a mutation, or a stop codon, as well as functional information on the variants, he said.
Qi, who is leading China's participation in the HVP, also said that a number of Chinese hospitals, academic institutions, and research centers — including genome sequencing powerhouse BGI — have signed on to submit data to the project.
He added that international groups are also welcome to contribute their variant information to the database, which is based on the Leiden Open Variation Database, an open source tool developed at Leiden University to collect and display DNA variants.
The Human Variome Project
Launched in 2006, the HVP is an international consortium of scientists, clinicians, informaticists, and diagnostic laboratory specialists that have come together to capture and share information on genetic variants affecting disease and are working to establish and maintain standards, systems, and infrastructure that will enable these activities.
"We want to see a gene/disease-specific database for every gene in the human genome and a country node in each country, feeding information to the gene/disease databases," Timothy Smith, the consortium's communications officer, told BioInform via e-mail.
He added that the HVP's goal isn't to establish new databases, but rather to act as an "umbrella organization" that identifies "partners that have the capacity to do so and work with them to ensure they are aware of other players in the field, participate in the development of technical standards, and have access to the data they need to populate their databases."
A roadmap detailing the HVP's agenda through 2012 explains that the consortium has put in place two methods of collecting data in an attempt to accommodate the ethical, cultural, and legal requirements of participating nations, as well as to ensure that no data sources are left out and to take advantage of expert curation for specific genes.
The first of these approaches focuses on establishing centralized nodes that are funded and maintained by participating countries such as the one in China.
Among other tasks, nodes are responsible for securing support and funding within their countries; ensuring that the database maintains consortium standards; evaluating the systems used to collect and store the data; ensuring system interoperability; developing methods for integrating variant and phenotype information; and addressing ethical concerns.
So far, the group has also established similar nodes in Austria, Australia, Belgium, Cyprus, Egypt, Greece, Kuwait, Malaysia, Spain, and Vietnam.
The second approach to collecting data relies on existing gene- and disease-specific databases that draw their information from scientific literature, diagnostic, and research labs. HGV organizers plan to encourage researchers involved in these initiatives to consolidate their efforts and resources.
As an example of these efforts, the International Society for Gastrointestinal Hereditary Tumors, or InSiGHT, has begun uniting separate databases for mismatch repair genes that are associated with colorectal cancer, HGV said in its roadmap document.
InSiGHT is also participating in an HVP pilot project to create a template for disease-specific gene data collection, focusing on four colon cancer genes.
HVP is creating standardized mutation nomenclature, variant description and annotation methods, a clinical ontology, and methods that will allow diagnostic labs to characterize unclassified variants and capture new mutations as they run tests.
According to a paper that the consortium published in January in Human Mutation, variant entries should include gene and variant name, pathogenicity, test data, disease information, and patient information, among other information.
Furthermore, the HVP will provide data visualization tools and database software for members' use. One of these programs, VariVis, generates graphical models of gene sequences with corresponding variations and their consequences.
Once the data has been collected, it will be manually curated as a final quality control step before it's made publicly available.
The HVP is "investing heavily" in developing curation methodologies and automated curation tools to help with the costs of curating gene- and disease-specific databases, the roadmap document said.
Ultimately, variant data collected from all these sources will be deposited in several central databases, including Online Mendelian Inheritance in Man; the Human Gene Mutation Database; the University of California, Santa Cruz, Genome Browser; and in repositories hosted by the National Center for Biotechnology Information and the European Bioinformatics Institute.
The consortium is also working with other groups that are involved in cataloging genetic variants.
These include UC Berkeley's Genome Commons project; the Genotype-to-Phenotype project, which aims to combine human and model organism genetic variation databases; the Catalogue of Somatic Mutation in Cancer; and the Mawson database, which is gathering genetic variants found during testing of the BRCA1 and BRCA2 genes in Australian laboratories.
By the end of 2012, the HVP consortium plans to have established country-specific collection systems in five countries and shared information on 100,000 individual cases of inherited disease. By 2015, the project organizers plan to increase the number of nodes to 20 with information on more than a million cases of genetic disease.
While the HVP is one of the larger efforts to provide centralized access to disease variants, there are other ongoing attempts to consolidate disparate variant databases.
One of these projects is focused on harmonizing two clinical-grade variant databases — MutaDataBase, an effort led by a non-profit foundation of the same name based in Belgium; and ClinVar, a resource under development at the National Center for Biotechnology Information (BI 8/5/2011).
Smith told BioInform that the HVP is working with the researchers in the MutaDataBase project.
"They have a huge store of very useful information that will hopefully soon become available to assist in the diagnosis and treatment of patients all around the world," he said.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.