By Monica Heger
Following the publication last week of the first human genome sequence of a Japanese individual, researchers at Japan's Riken are now focusing on cancer sequencing projects as part of the International Cancer Genome Consortium.
As described in a paper published last week in Nature Genetics, the Riken researchers sequenced a Japanese HapMap individual on the Illumina Genome Analyzer to 40-fold coverage. Read lengths were between 32 and 76 base pairs for a total of 12 sequencing runs, and the team used a paired-end sequencing strategy with 200 base pair inserts, generating 121.1 gigabase pairs of sequence data in total.
The researchers compared the genome to six previously published genomes: Jim Watson's, Craig Venter's, two Korean individuals, a Han Chinese individual, and an African individual from Nigeria. They found an excess of rare variants that appeared to affect gene function, including singleton nonsense and nonsynonymous single nucleotide variations as well as SNVs in conserved non-coding regions.
This finding was not especially surprising, however, said Tatsuhiko Tsunoda, senior author of the paper and head of the lab at Riken's Center for Genomic Medicine. Because nonsense, nonsynonymous SNVs and SNVs in conserved regions potentially impact gene function, they are more likely than noncoding or synonymous SNVs to be singletons.
"Our study revealed a large amount of variation, including SNVs, CNVs, medium-sized deletions, inversions, and non-reference insertions. These results suggest that much variation remains unidentified in the human genome," the authors reported.
Nonsense SNVs were over-represented in genes related to "olfactory receptor activity, sensory perception of chemical stimulus, sensory perception of smell, and antigen processing and presentation," they wrote.
In addition, the team's de novo assembly of unmappable reads generated three megabases of novel sequence. While the novel sequence didn't map to the reference genome, much of it did align to human genome sequences contained in the National Center for Biotechnology Information's database. In addition, they found sequences from the human herpesvirus 4 and cattle, which they attributed to contamination from methods used to treat the cell lines.
"The de novo assembly is quite important because, without it, we cannot get any information from the unmapped reads," said Tsunoda. "By assembling them into contigs, we can predict the actual DNA segments that are deleted or inserted, compared to the reference human genome, and also try to align them to other genomes."
The researchers compared three different assembly algorithms: Abyss, SOAPdenovo, and Velvet. Tsunoda said the three algorithms yielded comparable results.
Tsunoda told In Sequence that the team initially intended to compare the genome on a population basis to the six other genomes, but "found that it is dangerous to say too much because the seven genomes used different platforms, analysis methods, and sequencing read depths, and also because we reported just one Japanese individual."
In order to find Japanese-specific genomic variations, Tsunoda estimated that it would be necessary to sequence "dozens" more individuals. The fact that the de novo assembly of unmapped reads showed sequences found in other individual genomes is indicative of the large individual variation between genomes, he said, rather than population-specific differences.
Tsunoda said that the team is now focused on cancer sequencing projects as part of the International Cancer Genome Consortium. The team is sequencing 500 tumor/normal pairs of liver cancer, as well as the hepatitis B and hepatitis C viruses. They will sequence those samples to 30-fold coverage, using a similar sequencing strategy as detailed in the Nature Genetics paper.
Riken's Center for Genomic Medicine is currently equipped with two Illumina GA's and will soon introduce an Illumina HiSeq 2000. Additionally, Tsunoda said Riken plans to buy a third-generation sequencer. The Pacific Biosciences machine is a likely candidate, he said, but they are still evaluating options.
Researchers at the center are also studying nearly 50 diseases using genome-wide association studies. Tsunoda said that in the future, they may use sequencing to study those diseases, which include other types of cancer, myocardial infarction, and diabetes.