NEW YORK – Researchers in South Korea, the US, and the UK have released an initial set of data from the Korean Genome Project (Korea1K), including Korean-specific genome variation patterns, which they said can be a useful resource for clinical and ethnogenetic studies.
The first phase of Korea1K includes 1,094 whole genomes, sequenced at an average depth of 31x, paired with data on 79 quantitative clinical traits, the researchers reported in a study published on Wednesday in Science Advances. They identified 39 million single nucleotide variants and indels, of which half were singletons or doubletons, meaning they are extremely rare.
"Also, Korea1K, as a reference, showed better imputation accuracy for Koreans than the [1000 Genomes (1KGP)] panel," the authors added. "As proof of utility, germline variants in cancer samples could be filtered out more effectively when the Korea1K variome was used as a panel of normals compared to non-Korean variome sets."
Of the 1,094 Korean genomes in the dataset, 1,007 genomes were newly generated, the researchers said, and they combined these data with systematically acquired clinical and biochemical measurements from the blood and urine of the participants. They characterized SNVs, indels, copy number variations, transposable element (TE) insertion, and human leukocyte antigen (HLA) type in the Korean population and contrasted the Korean data with similar data from other populations.
Approximately half of the variants they identified were classified as singletons or doubletons. Surprisingly, more than 70 percent of them had not been previously reported in dbSNP, and less than 20 percent of the variants were classified as very common. Regarding indels, the researchers observed more deletions than insertions, possibly resulting from skewed variant calling.
They also found 35 drug response variants annotated in ClinVar. Eleven of them had significantly different allele frequencies compared to Chinese or Japanese individuals in the 1KGP set, highlighting the importance of population-specific datasets when interpreting pathogenic or drug-response variants. For example, the variant rs4961 in the ADD1 gene had the highest frequency in the Korea1K dataset compared to other populations. That variant is associated with hypertension and responsiveness to furosemide and spironolactone, as shown in a European study, but no significant association with blood pressure was found in the GWAS the researchers performed using the Korea1K set.
Overall, the researchers noted, the current sample size for the dataset is still insufficient to represent the Korean population or to map latent genomic structural variations.
"Our investigation of using Korea1K as a panel of normals for cancer genomics studies can be a small stepping stone for an efficient germline prefiltering process for cancer genome analyses in the future," they also wrote. "However, it is still questionable how much actual benefit such ethnicity-specific variome-based filtering can bring to cancer genome analyses in real clinical settings, especially for rare or individual-specific variant analysis."
However, the researchers added, the large-scale Korean variome database contained in the Korea1K reference is potentially applicable in studies on various cancers and other diseases in the Korean population, and could indirectly help reduce the cost of certain genetic analyses.
"This kind of personal whole-genome dataset combined with common health check–derived clinical information is possibly a good exemplary path for an ethnicity-relevant reference panel for future personalized medical applications for Koreans," the authors concluded.