NEW YORK – By focusing on diverse populations in Singapore, an international team led by investigators at the Agency for Science, Technology, and Research (A*STAR) got a glimpse at the genetics for individuals of Chinese, Malay, and Indian ancestry — groups that have been underrepresented in past population sequencing efforts.
"The three major ethnicities in Singapore together provide a unique snapshot of the genetic diversity across East Asia, Southeast Asia, and South Asia," co-corresponding authors Chaolong Wang, an epidemiology, biostatistics, computational, and systems biology researcher affiliated with Huazhong University of Science and Technology and A*STAR, and Jianjun Lui, a human genetics researcher at A*STAR and National University of Singapore, and their colleagues wrote in their new Cell paper.
Members of the SG10K Consortium sequenced 4,810 Chinese, Malay, and Indian individuals in Singapore, uncovering more than 98.3 million new or known polymorphisms that were used to evaluate Asian ancestry proportions within these ethnic groups. Along with clues for improving genotype imputation in the populations, they tracked down at least 20 loci showing signs of natural selection, including more than two-dozen sites that overlapped with loci implicated in traits or disease through past genome-wide association studies.
Based on these findings, the authors suggested that "[whole-genome sequencing] analysis of Singaporeans has the potential to benefit populations across Asia and the remainder of the globe."
"Coupled with a much better coverage of rare variants and the rapid accumulation of Asian GWAS data, we expect our SG10K data to be a valuable resource to advance genetic studies of heritable traits and complex diseases in Asians and to mitigate the population disparity in current human genetics research," the authors wrote.
The team also began retracing relationships between the Asian populations, including a 24,800-year-old split between the lineages leading to Malay and Chinese populations. More recently, around 1,700 years ago, it saw admixture between the Malay population and populations from East Asia — interactions suspected to have stemmed from expansions in Austronesia.
For their analyses — which represent an early phase of a project intended to include around 10,000 individuals from Singapore — the researchers attempted to get a glimpse into the genetics of the roughly 4.5 billion people living in Asia by sequencing more than 4,800 individuals from Singapore to an average depth of 13.7-fold. The participants included 2,780 individuals of Chinese ancestry, 903 Malay individuals, and 1,127 individuals of Indian ancestry.
After removing variants that did not meet the team's quality control criteria, it was left with some 89.1 million SNPs and more than 9.1 million small insertions and deletions. Of those, roughly 45.6 SNPs and 6.3 million indels had not been documented in past population studies.
The investigators confirmed the quality of such variants through comparisons with available genotypes for more than 1,200 individuals profiled previously, before digging into rare variants not covered with arrays. From there, they documented the number and type of variants present in a subset of 2,525 healthy, unrelated individuals from the new study, including variants with ties to recessive disease risk.
For a broader population analysis, meanwhile, the team set the new SG10K variant profiles alongside data from the 1000 Genomes Project, uncovering a cluster of Singapore Malay individuals that was separate from the East Asian cluster that encompassed the Singapore Chinese and Singapore Indian individuals.
In a series of follow up analyses, the researchers took a closer look at ancestry components in the individuals, more refined relationships within Asia and beyond, as well as sites under selection in the Singapore populations profiled.
"We were able to detect many loci reported previously in Asian populations … even with a stringent criteria," the researchers reported, adding that "we were able to discover additional selection candidates with solid evidence, where all index SNs showed substantial allele frequency drift."