NEW YORK (GenomeWeb News) – A newly formed international consortium recently unveiled a resource for improving the human reference genome.
The Genome Reference Consortium represents a small group of centers and institutions that are actively working to not only take stock of the gaps and small errors in the human reference genome, but also to incorporate new information about the magnitude of normal variation in the genome.
The group launched a new web site last week to coincide with the Biology of Genomes meeting at Cold Spring Harbor Laboratory. The site lets users access information on individual chromosomes and report problems with specific regions of the reference sequence.
“Pretty much everyone, uniformly, thought it was a good idea,” NCBI staff scientist Deanna Church, who presented a poster introducing the GRC at the Biology of Genomes meeting, told GenomeWeb Daily News.
The GRC consists of members from the Wellcome Trust Sanger Institute, the Genome Center at Washington University, the European Bioinformatics Institute, and the National Center for Biotechnology Information. The project is being funded by the National Human Genome Research Institute and the Wellcome Trust.
The wet bench work will be done at the Sanger Institute and Washington University’s Genome Center, Church said, while the EBI and NCBI are offering “bioinformatics support to make curation for experimentalists simpler.”
“It is now apparent that some regions of the genome are sufficiently variable that they are best represented by multiple sequences in order to capture all of the sequence potentially available at these loci,” the GRC website states. “The goal of this group is to correct the small number of regions in the reference that are currently misrepresented, to close as many remaining gaps as possible, and to produce alternative assemblies of structurally variant loci when necessary.”
The first seeds of the GRC were planted several years ago. Even as researchers were publishing papers on chromosomes sequenced during the human genome project, they realized that they weren’t capturing all the information necessary for a complete reference. “It was appreciated at the time that there were still gaps and things like this,” Tim Hubbard, head of informatics at the Sanger Institute, explained. “There was the question of what to do about that.”
Newer technology, sequencing of additional human genomes, and an increasing appreciation of normal human genetic variation underscore the need to re-assess the reference genome.
“Overall, [the human reference] is really quite a good assembly.” Church said. But, she added, that doesn’t mean the specific locus a researcher might be interested in is 100 percent correct.
“We have issues, I think, for pretty much every chromosome right now,” she said, noting that the consortium has heard of reports for problems in all but one or two chromosomes. “We are very aware that there are certain regions in the genome associated with copy number variations and disease phenotypes.”
The GRC web site currently includes literature related to each of the human chromosomes along with potential problems or concerns that have been noted for each so far. For instance, the site includes a table summarizing the regions currently under review in the reference genome. That, and other features on the site, will be updated shortly, Church noted. It also houses similar information for the mouse reference genome.
In order to improve the human and mouse reference genomes, the GRC is asking researchers to report any issues they have discovered in particular regions of the reference genome. New data is also being collected by members of the collaboration and through projects such as the 1000 Genomes Project to inform future reference assemblies.
For instance, the GRC website currently contains information on builds 35 and 36 of the human reference genome. The collaboration hopes to release the next build in the spring of 2009 and update it annually after that, Hubbard said. Though he noted that it may inconvenience those who have to remap their data each time a build is released, Hubbard emphasized the need for an accurate and complete reference genome.
“Overall, it’s necessary. It’s really important to do this to make sure people aren’t being misled,” Hubbard said, adding, “We’re open for business in terms of collecting information.”