NEW YORK (GenomeWeb) – Researchers at the University of California Santa Cruz's Genomics Institute have received a grant for up to $1 million from the Simons Foundation that will support a one-year pilot project to create a comprehensive map of human genetic variation for biomedical research.
Co-leading the project is David Haussler, a professor of biomolecular engineering and director of the Genomics Institute at UC Santa Cruz, and Benedict Paten, a research scientist at the Genomics Institute.
They'll work with scientists at the Broad Institute, Memorial Sloan Kettering Cancer Center, UC San Francisco, Oxford University, the Wellcome Trust Sanger Institute, and the European Bioinformatics Institute to develop algorithms and formulate the best mathematical approaches for constructing a new graph-based human reference genome structure that will better account for and reflect the different kinds of variation that occur across populations. They'll test algorithms developed as part of the project on tricky parts of the genome within the first six months of the pilot, Paten said in a statement.
The researchers will use a dataset of more than 300 complete and ethnically diverse human genomes sequenced by researchers at the Broad Institute to construct the reference structure and they'll also leverage work done to create a standard data model for the structure by members of the reference variation task team, a subgroup of the data working arm of the Global Alliance for Genomics and Health that Paten co-leads.
The project aims to overcome the limitations of the current model for analyzing human genomic data, which relies on mapping newly sequenced data to a single set of arbitrarily chosen reference sequences resulting in biases and mapping ambiguities. "One exemplary human genome cannot represent humanity as a whole, and the scientific community has not been able to agree on a single precise method to refer to and represent human genome variants," Haussler said in a statement. "There is a great deal we still don't know about human genetic variation because of these problems."
Paten added that the proliferation of different genomic databases within the biomedical research community has resulted in hundreds of specialized coordinate systems and nomenclatures for describing human genetic variation. This poses problems for tools such as the widely used UCSC Genome Browser which was developed and is maintained by UCSC researchers. "For now, all our browser staff can do is to serve the data from these disparate sources in their native, mutually incompatible formats," Paten said in a statement. "This lack of comprehensive integration, coupled with the over-simplicity of the reference model, seriously impedes progress in the science of genomics and its use in medicine."
The diversity of genomes in the Broad's dataset, Paten continued, offers a rich data resource that will be used "to define a comprehensive reference genome structure that can be truly representative of human variation." The plan is eventually to expand the graph-structure to include many more genomes, he said.
The researchers expect to have a draft variation map available by the end of the year. Paten and Haussler have also outlined the follow-up activities needed to extend the pilot project and fully realize their vision for the new map.
The new map will make it easier to detect and analyze both simple and complex variants that contribute to conditions with a genetic component such as autism and diabetes. It will also be a valuable tool for understanding recent human evolution, according to the researchers.