NEW YORK (GenomeWeb News) – A Riken and University of Tokyo research team reported in the early, online version of Nature Genetics yesterday that it has sequenced the first personal genome of a Japanese individual.
The researchers first used high-throughput sequencing to tackle the genome of an anonymous Japanese man before comparing the sequence to the human reference genome and to six other personal genomes. In the process, they identified a range of single nucleotide variants, copy number changes, rearrangements, and previously undetected sequence — findings that they say highlight the amount of undocumented variation in the human genome.
"Our analysis suggests that considerable variation remains undiscovered in the human genome and that whole-genome sequencing is an invaluable tool for obtaining a complete understanding of human genetic variation," senior author Tatsuhiko Tsunoda, a researcher at Riken's Center for Genomic Medicine, and co-authors wrote.
Genome-wide association and other studies are providing clues about common variants in the human genome and their influence on common disease risk, the team explained. But getting a handle on rarer variants and their contribution to traits and diseases is a trickier undertaking that hinges on targeted and whole-genome sequencing of individual genomes.
The team used the Illumina Genome Analyzer II to do paired-end sequencing of the genome of a Japanese man who had already been genotyped as part of the International HapMap project.
The man clustered genetically with individuals from the Japanese mainland, Tsunoda told GenomeWeb Daily News in an e-mail message. And because the participant was male, Tsunoda added, it was possible to use X- and Y-chromosome data to help estimate the sequencing error rate.
Of the more than 121 billion bases of sequence generated, the team noted, nearly 96 percent mapped to Build 36 of the human reference genome, covering 86.6 percent of the genome with four or more reads, with an average depth of 40 times. Along with the reference-mapping reads, researchers found three million bases of new sequence, including sequences that seem to correspond to non-reference human genomes.
The team also found some human herpesvirus 4 and Bos taurus genome sequences, which they attributed to possible contamination by materials used for cell line production and culturing methods, respectively.
Their analyses of the genome and comparisons with six previously sequenced personal genomes uncovered more than 3.1 million SNPs — including 9,783 non-synonymous SNPs and 96 nonsense SNPs that fell in protein-coding regions of the genome — as well as 5,319 small deletions and numerous copy number changes and rearrangements.
The researchers also found a surplus of rare, non-synonymous SNPs in the genome. And overall, they noted, single nucleotide changes tended to be more concentrated in telomere, centromere, and HLA regions of all seven genomes.
In addition, the team detected distinct patterns with respect to the frequency of singleton SNPs — variants that don't tag other SNPs — depending on the type of change and where it occurred in the genome. For example, their results suggest conserved, non-coding regions of the genome tended to have a larger fraction of singleton SNPs than other non-coding regions or synonymous SNVs, Tsunoda noted.
While they cautioned that sequencing the combination of short read strategy they used can potentially produce false positive variants unless combined with sensitive mapping and assembly techniques, the researchers concluded that their findings point to "a substantial number of unreported [single nucleotide variations], insertions, deletions, and other variants."
"These results suggest that much variation remains unidentified in the human genome and that whole-genome sequencing provides us with the opportunity to detect human polymorphisms that will be required for the advancement of personalized medicine," Tsunoda and his colleagues wrote.
The team doesn't currently have plans to do a large-scale, whole-genome sequencing project in the Japanese population, Tsunoda explained, though it is sequencing about 500 cancer samples and 500 matched normal tissues to high depth as part of its participation in the International Cancer Genome Consortium.