NEW YORK (GenomeWeb) – The high-quality Asian reference genome being produced by investigators at Macrogen and Seoul National University is expected to serve as the centerpiece for a broader project that will ultimately generate genome sequences for more than 10,000 Asian individuals.
During a teleconference with In Sequence, Macrogen Chief Technology Officer Kap Seok Yang, Changhoon Kim, a research director with Macrogen, and Jeong-Sun Seo, director of Seoul National University College of Medicine (GMI), provided details on the Asian reference genome effort and the broader Asian genome study.
For the Asian Reference Genome Project, the researchers plan to put together a de novo genome assembly with single-molecule real-time (SMRT) reads produced using a pair of Pacific Biosciences RSII instruments. That assembly will subsequently be improved with the help of Illumina HiSeq X Ten BAC clone reads.
Genomic DNA for the effort is coming from the same Korean individual sequenced for a 2009 study in Nature. Roughly 100,000 BAC clones have already been generated using cell lines from that man, a member of the Altaic language group known as "AK1," the study leaders explained.
With the original Asian genome sequence — and a subsequent copy number study from some members of the same team — researchers have become increasingly aware of the variable sequence and structural features present in the genomes of individuals from Asian populations.
For the Asian Genome Project, Macrogen and GMI investigators are seeking a more complete Asian reference genome to pursue detailed analyses of the populations in Asia.
"Our goal is to make a complete Asian reference genome for future medical practice," Macrogen's Kim said, noting that the team is pursuing a "medical grade" genome sequence that is highly accurate and can serve as a reference in both research and clinical settings.
At the moment, he and his colleagues explained, genotyping errors are apt to occur when attempting to map short-read sequence data from individuals of Asian ancestry to the standard human reference genome, which is based largely on sequences from one or a few Caucasian individuals.
That has led to difficulties during pilot work for the larger Asian Genome Project. For the first phase of that study, the researchers set their sights on generating whole genome and exome sequence data for up to 1,000 healthy Asian individuals.
Yang, Kim, and Seo noted that Asian Genome Project members have already done low- and higher-coverage exome sequencing on around 500 Asian samples using Agilent targeted capture kits and Illumina sequencing, generating exomes covered to 100-fold or more, on average.
But the group has encountered technical problems for the genome arm of the study, owing to the absence of an appropriate reference sequence for putting together its newly generated short-read sequence data.
For instance, the researchers explained, issues can arise when attempting to adequately account for the structural variation found in non-reference populations, while population-related differences in allele frequency may muddle the interpretation of available genome sequences for a given individual.
The team also had concerns with remaining gaps in the existing human reference sequence, prompting interest in applying a long-read technology when setting out to come up with the new Asian reference assembly.
To that end, the researchers will primarily rely on PacBio long reads for their initial de novo assembly of the Asian reference genome, which they plan to cover to an anticipated average depth of 60-fold.
The Seoul National University College of Medicine is leading the reference genome effort, though sequencing and data analysis stages of the study will be overseen at Macrogen. The reference genome arm of the project is expected to wrap up in the next six months, Yang, Kim, and Seo said, with a price tag of roughly $200,000 to $300,000.
In parallel, members of the Asian Genome Project will continue doing population sequencing on individuals from Korea, Mongolia, and other sampling locations in Asia as part of phase one of the larger study.
That data, in turn, is expected to serve as fodder for more precise calculations of allele frequency patterns, copy number variation profiles, and structural variation in Asia, en route to producing a pan-Asian genome reference.
Along with population demography and population history in northern Asia and beyond, the researchers are hoping to understand how common and rare genetic variation contributes to disease risk in the region.
The latter work is the focus of the project's second phase: genome sequencing on more than 10,000 Asian individuals from Korea, Mongolia, Central Asia, Japan, China, and elsewhere in an effort to find new genetic contributors to brain-related disorders, cancer, and other genetic diseases in Asian populations, particularly northern Asian populations such as the Altaic language group. The group is carving out collaborations with hospitals and other groups for that phase of the study, set to take place between 2015 and 2017, and the project may soon expand to include individuals with other conditions.
For example, Yang, Kim, and Seo noted that the researchers plan to work with an association focused on muscular dystrophy in the relatively near future. There is also interest in looking for genetic contributors to autism spectrum disorder in Korea as part of the Asian Genome Project.
At the moment, the team anticipates sequencing individuals' genomes to average depths of around 30-fold apiece using the Illumina HiSeq X Ten for both phase one and two of the project.
The cancer-centered analysis will involve sequencing both tumor and matched normal samples from at least 3,000 participants, though Macrogen representatives noted that the second phase of the study could ultimately involve sequencing as many as 30,000 disease-related genomes.
"We want to get a medical-grade reference sequence [for the Asian Reference Genome Project]," Yang said. "Then we want to expand this to look at Asian disease genomes."
Macrogen is receiving financial support from Korean government grants, along with personal donations to support the broader Asian Genome Project, he and his colleagues explained.
The researchers are optimistic that the reams of sequence data they plan to produce in the coming years will be useful not only for finding population-specific disease contributors, but also for coming up with genomics-based methods to diagnose and treat the conditions considered.