NEW YORK – The National Institutes of Health has awarded grants totaling $29.5 million to two centers involving researchers in the US and Europe to generate and maintain a new human genome reference sequence that will better represent human diversity.
"The proposed improvements will serve the growing basic and clinical genomics research communities by helping them interpret both research and patient genome sequences," Adam Felsenfeld, program director in the Division of Genome Sciences at the National Human Genome Research Institute, which manages the awards, said in a statement.
Approximately $12.5 million over five years will go to Washington University in St. Louis; the University of California, Santa Cruz; and the European Bioinformatics Institute to form the WashU-UCSC-EBI Human Genome Reference Center. In coordination with the National Center for Biotechnology Information, the center will provide a multi-genome reference sequence or "pan-genome."
"We will create a high-quality map of sequence alignments and variants and use the genome graph methods that we have pioneered to build a pan-genome resource that naturally represents genetic diversity," the researchers said in the grant abstract.
Funded by approximately $17 million over five years, UCSC will lead the Human Reference Genome Sequencing Center, which will generate complete, error-free, gapless, and haplotype-phased genome assemblies from as many as 350 individuals. The center will include collaborators at WashU, the University of Washington School of Medicine, Rockefeller University, Mount Sinai, Harvard University, the Broad Institute, the Coriell Institute for Medical Research, Canada's McGill University, the UK's University of Cambridge, and Germany's Max Planck Institute. UCSC researcher Karen Miga will direct the center.
"One human genome cannot represent all of humanity," David Haussler, director of the UC Santa Cruz Genomics Institute and principal investigator on the UCSC-led grant, said in a statement. "The human pan-genome reference will be a key step forward for biomedical research and personalized medicine. Not only will we have 350 genomes representing human diversity, they will be vastly higher quality than previous genome sequences."
"We are going to use all of the latest and best sequencing technologies and push their capabilities to get the most complete and accurate sequences possible," Haussler said, including long-read and linked-read sequencing technologies. Oxford Nanopore Technologies, Pacific Biosciences, and Illumina "will be supporting the center as contributing partners," UCSC said in a statement.
The new effort will continue the work of the Genome Reference Consortium (GRC), which provides reference assemblies for human, mouse, zebrafish and chicken genomes. Its latest human reference genome assembly, GRCh38, was released in 2013 and has been updated with patches. Last year, the GRC noted on its website that it decided "to indefinitely postpone" the release of GRCh39 as it evaluates "new models and sequence content for the human reference assembly currently in development."
The new pan-genome program "is intended to replace and update our previous contribution to genome reference activities," an NHGRI spokesperson said in an email. "In the process we will be doubling our funding for this important activity."
The lead investigators for the UCSC-led Human Pangenome Sequencing Center include Evan Eichler at the University of Washington, Ira Hall at WashU, and Erich Jarvis at Rockefeller University. The UCSC participants include Benedict Paten, Ed Green, and Mark Akeson.
The lead investigators for the Human Pangenome Reference Center include Ting Wang and Hall at WashU, Paten at UCSC, and Paul Flicek at the EBI. Other participants include Cambridge's Richard Durbin, Max Planck's Gene Myers, Wellcome Sanger Institute's Kirsten Howe, NHGRI's Adam Phillippy, the Broad's Heng Li, Mount Sinai's Eimear Kenny, Coriell's Alissa Resch, and the Chan-Zuckerberg Initiative's Paolo Carnevali.
In their grant abstract, the sequencing center researchers mentioned four aims: to sequence individuals that can help fill gaps in human genetic diversity, to generate highly contiguous chromosome-level assemblies, to finish those genomes from telomere to telomere (T2T) for each chromosome, and to evaluate the genomes for accuracy and completeness and perform initial variant calling. Miga and Phillippy are coordinators of the T2T Consortium, which last month published a preprint to BioRxiv describing a telomere-to-telomere assembly of a complete human X chromosome.