NEW YORK (GenomeWeb) – Nearly half the world has Asian ancestry, yet from a genomics standpoint, these populations are poorly understood. Such a knowledge gap has implications for precision medicine, according to Stephan Schuster, professor at Nanyang Technological University in Singapore, who is heading up the GenomeAsia 100K Project to sequence the whole genomes of 100,000 individuals to 30x coverage.
Schuster presented the project at last week's Advances in Genome Biology and Technology meeting in Orlando, Florida, and spoke with GenomeWeb on the sidelines of the conference.
Most genomic studies to date have focused on individuals of European descent, thus missing "the bulk of genetic diversity in the world," Schuster said. There is "unexpected and unprecedented" diversity in Southeast Asia that needs to be charted in order to realize the promise of precision medicine, Schuster said.
The GenomeAsia 100K project will attempt to capture this missing diversity. The consortium so far includes Nanyang Technical University, Macrogen, MedGenome, the Genomic Medicine Institute at Seoul National University, and Illumina, although Schuster said that it is open to adding collaborators.
Nanyang Technical University will serve as the coordinating institution, Macrogen and MedGenome will provide sequencing, and MedGenome and the Genomic Medicine Institute at Seoul National University will provide most of the samples.
The project will attempt to address some of the gaps in the knowledge of Asian genomics by sequencing 100,000 individuals, including disease cohorts from oncology, neurology, diabetes, autoimmune disorders, cardiovascular disease, opthamological disorders, and rare inherited diseases.
Already, Schuster said, the organizations have biobanked samples from 50,000 individuals who have consented to broad genomic research testing, and have already sequenced the genomes of 2,000 of those. The individuals represent geographical regions across the entirety of Asia, Schuster said, which "will allow us to capture the diversity."
The majority of the sequencing will be done on Macrogen's Illumina HiSeq X Ten system. MedGenome will contribute sequencing and will also provide significant analysis.
The project is expected to last for around three years and could cost upwards of $100 million, of which MedGenome has already announced it would contribute $10 million. Other founding institutions will also contribute, although have not yet specified how much.
Schuster said the project would include two components: the population sequencing aspect and an effort to generate Asian reference genomes. He said the researchers could create between 50 to 100 de novo assembled Asian reference genomes using Illumina and PacBio sequencing.
A better understanding of population-specific genomic variation is a critical component of precision medicine, Schuster said.
For example, he said, variations in the OCT1 gene, which is a transporter of vitamin B, differ by ethnic populations. Variants in the cardiovascular disease gene MYBPC3 have a high incidence in India and central Asia, but not in other Asian countries. Genetic variants that cause glaucoma are also present at different frequencies in different populations. "Ancestry and disease are related," Schuster said.
In addition, individuals often do not know their ancestral background. When Archbishop Desmond Tutu's genome was sequenced, it was discovered that he had significant Khosian ancestry that he was unaware of, Schuster said. When Personalis CEO John West, who is British, had his genome sequenced, he discovered that he was more closely related to Central European and German populations, Schuster said, joking that he should be called Hans.
Understanding the Asian contribution to genetic diversity will have a global impact, Schuster said. Asia holds nearly half the world's population and, increasingly, individuals from different populations are mixing.
Currently, of the top 10 countries from which people are migrating, five are Asian countries, and those individuals are predominantly migrating to non-Asian countries.
Another goal of GenomeAsia 100k, Schuster said, is to work with other large-scale population sequencing projects to "harmonize the process" of population sequencing — agree on using the same methods for sampling, phenotypic data collection, quality standards, and re-contacting individuals — so that that the final data can be compared between projects like Genomics England's 100,000 Genomes Project and the US's Precision Medicine Initiative. "The full value of the data will only be realized if it's comparable to the other projects," Schuster said.
Kartik Kumaramangalam, chief of global products and services at MedGenome, told GenomeWeb during the AGBT meeting that MedGenome's role in the project would be twofold — providing access to the patients' samples who have consented to research and understanding the data.
Like Schuster, Kumaramangalam said the GenomeAsia 100K project is critical for understanding global genetic diversity. "If you look at the greater Asia region there is a huge population — 40 to 50 percent of the world's population," he said. If researchers do not understand the genetic diversity of these populations, that means "we're not getting insight into diseases for most of humanity."
The project also fits in with MedGenome's vision of accelerating genetic research, and particularly understanding the population substructures of the Asian region, Kumaramangalam said. MedGenome offers diagnostic genomic testing in India, so it is critical for the firm to have a handle on how Asian-specific variants impact disease or how pathogenic mutations differ or have different penetrance in the populations MedGenome serves, he added. "It matters for precision medicine," Kumaramangalam said.
Toward an Asian reference genome
Separately, Macrogen has already made significant headway on the Asian Genome Project, which it launched in 2014. One portion of that project involved generating a medical-grade Asian reference genome from an individual called AK1. At last year's AGBT meeting Macrogen provided an update on the project.
Since then, Macrogen researchers have continued to refine that genome using a variety of technologies, including Pacific Biosciences' single-molecule sequencing technology, Illumina sequencing on the HiSeq X Ten for error correcting PacBio contigs and phasing, BioNano Genomics' Irys system to create optical maps and scaffolds, 10X Genomics' linked read technology for long-range phasing, and BAC clone CE sequencing to check consistency with the assembly and phasing.
Jeong-sun Seo, chairman of Macrogen, told GenomeWeb that the the researchers have so far created a de novo assembly using the PacBio RSII with an N50 contig size of around 18 megabases. Scaffolding with BioNano's Irys along with fragile site rescue resulted in an N50 scaffold size of 45 Mb, while de novo phasing using the RSII, 10X Genomics' GemCode, and BAC clones resulted in phased N50 haplotype blocks of 11 Mb.
Seo said the researchers plan to submit the assembly for publication in a peer-reviewed journal by the end of the month.
Prior to the announcement of the GenomeAsia 100K project, Macrogen had already completed the first phase of the Asian Genome Project to sequence 1,000 individuals of Korean, Mongolian, Han Chinese, and Japanese descent, and had started on phase two to sequence 10,000 Asian genomes from various disease cohorts, Changhoon Kim, Macrogen's chief technology officer, told GenomeWeb.
Kim said that Macrogen's goal for participating in the GenomeAsia 100K project is to help "build a precision medicine initiative in Asia."
In a preliminary analysis of the AK1 assembly, Kim said that the researchers have identified around 20,000 structural variants that are not present in the GRCh38 human reference genome.
Many of those structural variants are potentially Asian-specific. And, although the results are preliminary, "in many of the novel insertions, new isoforms are being made and expressed," Kim said. He declined to disclose details of those findings until the paper is published, but said that such findings underscore the importance of having ancestry and population-specific de novo assembled genomes.
"The construction of a medical-grade genome" will be a "very good stepping stone" toward the goal of precision medicine, Seo said.