NEW YORK – A Chinese Academy of Sciences team assembled a variant resource, genetic reference panel, and imputation server centered on populations in China, making it possible to better interpret and unearth loss-of-function and other variants with potential disease implications.
"Our study provides a large and high-quality [whole-genome sequencing] resource for Chinese populations, which will be useful in examining the effect of known genetic variants on disease susceptibility and drug responses, and benefit clinical investigations in the future," co-senior and co-corresponding authors Shunmin He and Tao Xu, investigators at the Chinese Academy of Sciences, and their colleagues, wrote.
For a paper appearing in Cell Reports on Tuesday, members of the Han100K Initiative and several centers in China described tens of millions of variants and more than 5,800 haplotypes that were identified in populations across China for the NyuWa Genome Resource effort, named for "NuWa" or "Nüwa" — the "mother goddess who was the creator of the human population in Chinese mythology."
The team tracked down more than 79.2 million single nucleotide or small insertion or deletion (indel) variants as part of the NyuWa resource set, including 25 million new variants, with deep whole-genome sequencing on 2,999 Chinese participants enrolled across 23 administrative divisions in China. The genomes were sequenced to 26-fold coverage, on average.
"Constructing a comprehensive genome resource platform of the Chinese population empowers medical genetic discoveries in the world's largest population and contributes to the diversity of worldwide human genetic resources," the authors reported.
Among the variants, the team tracked down 1,140 pathogenic variants reported in the ClinVar database, nearly 3,800 long noncoding RNA, or lncRNA, splice variants, and more than 22,500 loss-of-function variants falling in coding or noncoding portions of the genome — a set of loss-of-function variants that encompasses 18,711 protein-truncating variants and almost 11,500 loss-of-function variants not identified in prior studies.
"The identification of loss-of-function variants for protein-coding and lncRNA genes in this study expands the catalog of loss-of-function variants in nature," the authors explained. "When combined with phenotype information, this resource will provide important biological insights into gene functions."
The researchers also narrowed in on nearly 19.3 million variants with a minor allele frequency of at least 0.1 percent in another group of 2,902 individuals for an integrated, refined NyuWa reference panel that they have applied to Han Chinese populations in both southern and northern China. They noted that the NyuWa resource and reference panel have been brought together in a variant database that includes an imputation tool.
The reference panel appeared to outperform several other population datasets when doing imputation in dozens of Asian populations, for example, curbing imputation error rates by anywhere from 30 percent to almost 51 percent in individuals with Han Chinese ancestry specifically.
"Population structure and imputation simulation tests support the applicability of one integrated reference panel for northern and southern Chinese," the authors wrote, adding that the broader genome resource is expected to boost future genetic studies on populations in China and perhaps other parts of Asia.
Even so, they noted that most of the samples found in the current iteration of NyuWa reference panel came from individuals in the majority Han Chinese population and emphasized that "performance of the NyuWa reference panel can still be improved by including more minority samples."