NEW YORK (GenomeWeb) – A team from Japan and the US did whole-genome sequencing on more than 1,000 healthy individuals from Japan, using the data to detect rare variants and develop a genetic reference panel for the population.
As they reported online today in Nature Communications, the researchers sequenced 1,070 individuals from the Tohoku Medical Megabank Organization (ToMMo) biobank as part of a prospective study known as 1KJPN that eventually aims to unravel genetic and environmental contributors to disease in Japan.
To first define normal genetic variation within the population, they used the sequence data to define 3.4 million small insertions or deletions, tens of thousands of gene-impacting copy number variants, and more than 21 million new or known single nucleotide variants in the healthy genomes.
The data also made it possible for the researchers to put together an imputation panel for the population, which was already used to impute a variant associated with Moyamoya disease, a progressive disorder affecting the brain's blood vessels.
"Our hope is that 1KJPN will foster basic research by amalgamating more accurate genotype imputation in cohort studies with medical information," senior author Masayuki Yamamoto, a researcher affiliated with ToMMo and Tohoku University, and colleagues wrote, "and thereby aid in constructing an advanced medical system to improve the quality of healthcare services."
After genotyping 1,344 ToMMo biobank samples on Illumina's HumanOmni2.5 array and weeding out those that didn't meet their quality or selection criteria, the researchers were left with samples for 1,070 healthy individuals.
The team sequenced each of the 1,070 genomes to an average depth of 32.4-fold on the Illumina HiSeq 2500 using libraries prepared with a PCR-free protocol.
With the help of filtering methods that took into account read depths and other features, the researchers narrowed in on 21.2 million high-confidence single nucleotide variants, including 12 million variants not described previously.
On the structural variant side, they saw 3.4 million indels along with 25,923 copy number changes expected to affect protein-coding gene sequences.
Smaller indels tended to be more common than larger alterations in the genomes. On the other hand, when insertions spanning more than 10,000 base pairs did turn up, they were far more likely to be novel.
Using their new high-confidence variant and indel data, the researchers put together a reference panel representing directly sequenced and imputed SNPs from 1KJPN.
In samples from 131 Japanese individuals not sequenced initially, this panel improved on the imputation accuracy that could be achieved using sequence data generated for individuals from Japan and elsewhere for the 1000 Genomes Project.
The team also demonstrated the usefulness of its reference panel by applying it to a genome-wide association study involving more than 100 Japanese individuals with or without Moyamoya disease who were genotyped on the Illumina HumanOmni1-Quad BeadChip array. Among the variants imputed with the 1KJPN panel was a SNP in the chromosome 17 gene RNF213 that was significantly associated with Moyamoya disease risk.
Meanwhile, the researchers' copy number analyses pointed to extra copies of a salivary amylase gene called AMY1 in most of the individuals involved in the study, while their look at selection signals in the Japanese genomes to weak purifying selection signals in both coding and non-coding regulatory elements.
Based on such findings, the study's authors argued that "this [single nucleotide variant] set is expected to contain many very rare variants that can be associated with diseases, and thus should be useful in future [genome-wide association studies] to fully capture causal variants."