This article, originally published July 8, has been updated to include additional information from GMI-SNU.
By Julia Karow
A team of researchers led by the Genomic Medicine Institute at Seoul National University has sequenced and analyzed the genome of an anonymous Korean man, AK1, using a combination of whole-genome shotgun sequencing and targeted bacterial artificial chromosome sequencing on the Illumina Genome Analyzer and comparative genomic hybridization by microarrays.
The genome is the first to be analyzed as part of the Asian 100 Genome Project, which aims to sequence and characterize the genomes of 100 individuals from different Asian countries by 2012 (see other article in this issue) in order to identify disease-relevant genes.
In March, the researchers completed sequencing a second genome, that of a Korean woman, for the project using the Applied Biosystems SOLiD 2 and 3 platform, study leader Jeong-Sun Seo told In Sequence this week. The results of that study are currently being validated, and the researchers plan to submit them for publication next month.
The genome of AK1, published online in Nature last week, represents the seventh published human genome sequenced on a next-generation sequencing platform to date, following those of James Watson, an African HapMap individual -- sequenced independently by Illumina and Applied Biosystems on their respective platforms --, an anonymous Han Chinese individual, an anonymous Caucasian acute myeloid leukemia patient, and Seong-Jin Kim, another Korean (see In Sequence 4/22/2008, 11/11/2008, 6/9/2009, 6/23/2009 and table below).
Five of these genomes were sequenced on the Illumina Genome Analyzer, and one each on the ABI SOLiD and the Roche/454 Genome Sequencer FLX.
The latest study also involved scientists at the National Center for Genome Resources in Santa Fe, NM, and Korean service provider Macrogen, who both contributed to the sequencing as well as the data analysis; Harvard Medical School, which developed a data interpretation tool; Illumina, which assisted in library construction and provided new reagents that permitted read lengths of up to 106 base pairs; and Psoma Therapeutics, which also helped with the library construction.
[ pagebreak ]
The AK1 genome analysis differs from the previous human genomes that have been sequenced in that it uses "technical advances" on the Illumina platform and contains "the most precise analysis of structural variations in an individual human genome to date," according to a statement from NCGR.
The main goal of the project was to generate genome information for AK1 that "can be used to predict potential medical problems in the future," Seo said in an e-mail message.
For the study, the researchers generated approximately 108 gigabases of shotgun sequence data on the Illumina Genome Analyzer, covering the genome at almost 28-fold depth using 36-base single-end reads as well as 2x36-base, 2x88-base, and 2x106-base paired-end reads with short and long inserts.
In addition, they generated another 23 gigabases of single-end and paired-end sequence data from sequencing BAC clones on chromosome 20 and BAC clones targeted for copy number variations.
They also used high-resolution custom-designed microarrays from Agilent Technologies and Human cnv370- and 610-quad Beadchips from Illumina to analyze structural variations in the genome.
The researchers identified nearly 3.45 million SNPs, including more than 10,000 non-synonymous ones, as well as approximately 170,000 insertions or deletions, and 315 structural variations. They annotated potential medical phenotypes for non-synonymous SNPs, coding domain indels, and structural variants.
Sequencing consumables costs totaled $200,000, according to Seo. The sequencing run time added up to six weeks on three Genome Analyzer instruments over a project time of 11 months, much of which was spent on validating the data, he said.
The project started in early 2008, according to a statement from the Genomic Medicine Institute. The researchers presented interim results at a Korean medical conference a year ago and results from the completed project at another meeting in Korea last November.
The origins of the project, though, date back further. Since 2000, Seo said, GMI has been working on a Korean Genome Project. In 2001, in collaboration with Macrogen, the institute built a BAC clone library and map with 100,000 clones from the DNA of AK1. That work, he said, was the cornerstone for future analysis of AK1, a healthy donor who was 22 at the time and whose DNA was selected randomly from 10 volunteers. He is not a staff member, according to Seo.
When next-generation sequencing technology became available to them in 2007, Seo said, he and his colleagues used the BACs for targeted analysis of specific haploid regions by deep sequencing. Targeted sequencing started with chromosome 20, but GMI also selected regions known to be copy number variation hotspots and sequenced them with up to 10,000-fold coverage. "To our knowledge, this is the first time to report such high-depth sequencing analysis of the human genome using [next-generation sequencing]," he said.
In combination with a custom-designed Agilent CGH array set with 24 million features, the targeted resequencing results yielded more than 300 copy number variants in AK1, the "most accurate determination of structural variants up to date," according to Seo.
He and his colleagues compared the results to several previously published genome sequences but found that differences could reflect both variations between individuals and different technical approaches, "highlighting the need for definition of foundational data standards," according to the paper.
"Currently, it is rather difficult to cross-compare indels and CNVs found by different groups," Seo said. "For personalized medicine, validated, standardized methods are required to process and compare large numbers of samples. Some consensus on submitting sequencing data is needed to enable cross-analysis between findings from different sequencing platforms and analysis methodologies."
Using Trait-O-Matic, a software tool developed by George Church's group at Harvard Medical School, the researchers analyzed the results for clinical implications. Trait-O-Matic is a web service that accepts a list of SNPs and searches for them in Evidence-Base, a publicly curated database of clinically significant variants. The service is intended to "facilitate a standardized interpretation of individual whole genomes by clinicians," according to the paper.
Some interesting results emerged. For example, Seo pointed out, AK1 may develop resistance to bleomycin, an antibiotic; have reduced sensitivity to statins; or be susceptible to tuberculosis. "Such information may be helpful for future clinical prognosis/prevention and drug prescription," he said.
The study was funded by an unspecified amount from the Korean Ministry of Education and Science Technology, the Genomic Medicine Research Foundation, Macrogen, Korean biopharmaceutical Green Cross, the US National Institutes of Health, and NCGR.