NEW YORK (GenomeWeb News) – In a pair of papers appearing online today in Nature, two international research teams reported that they have sequenced the complete genomes of an African and an Asian individual, pinpointing new and previously identified single nucleotide polymorphisms and structural variants.
In the first of these, a team of researchers led by investigators at Illumina used massively parallel sequencing to sequence the genome of a Yoruban male from Ibadan, Nigeria, whose DNA was collected as part of the HapMap project.
The team first validated their approach by sequencing a bacterial artificial chromosome containing human DNA and flow-sorted X-chromosomes from a Caucasian female. Then, they used six Genome Analyzers to sequence the Nigerian man’s genome to more than 30 times average depth, generating some four billion 35-base, paired reads over about two months.
The consumables cost roughly $250,000. By comparison, the researchers said that it cost about $300 million to generate the raw data for the first human genome sequence.
The sequence covered 99.9 percent of the human reference genome (NCBI build 36.1) to about 41 times depth, on average. During their subsequent analysis, the team detected about four million SNPs in the Yoruban genome. Just over a quarter of these were distinct from those found in the dbSNP database.
“This genome from a Yoruba individual contains significantly more polymorphism than a genome of European descent,” the authors noted. And the Yoruban genome was more heterozygous than Caucasian genomes sequenced so far. This heterozygosity was lower in coding regions than in non-coding regions, although the team detected more than 26,000 SNPs within coding regions. They also found about 400,000 short insertions and deletions — half of which are also in dbSNP.
“Our short-insert paired-read data set introduced a new level of resolution in structural variation detection, revealing thousands of variants in a size range not characterized previously,” the authors said.
The researchers added that the same massively parallel sequencing approach may also be used to understand processes such as transcriptional activity, gene regulation, epigenetics, and chromatin modification.
In a second paper, a team led by Jian Wang, a researcher affiliated with the Beijing Genomics Institute and Shenzhen University’s Genome Research Institute, used a similar approach to sequence the genome of a Han Chinese man from East Asia.
Using Illumina Genome Analyzers, the researchers got sequence information for 92 percent of the individual’s genome at 36-fold coverage, on average. The average read length was 35 base pairs and the team generated about three billion quality reads while sequencing the genome. That sequence covered 87.4 percent of the NCBI 36.1 reference sequence to 36-fold average coverage.
The team’s analyses indicate that the man’s genome contains more than three million SNPs. Nearly 14 percent of these were not found in dbSNP. They also detected 135,262 insertions and deletions compared to the reference, along with 2,682 structural variations. Most of these structural variations were within transposable elements or repetitive sequences, the researchers noted, although the Asian genome did contain variations that completely or partially deleted 33 genes.
The team also compared the Asian man’s genome sequence with those of the Jim Watson and Craig Venter genomes. Their results suggest that the three genomes share 1.2 million SNPs, with each individual genome containing around a million unique SNPs. The three genomes had similar levels of non-synonymous SNPs: between 0.2 and 0.23 percent.
The researchers also went on to look at mutation and selection in the genome, in addition to a disease risk screen and haplotype and ancestry analyses. For instance, the team estimated that 94 percent of the Han man’s alleles were Asian while 4 percent were European and nearly 2 percent were African. According to their preliminary search, the man is at increased risk of Alzheimer’s disease and has genotypes linked to tobacco addiction.
But the results of the two studies are not just providing information about individual genomes. Researchers predict that the number of individuals having their genomes sequenced will continue increasing as prices drop and scientists get a better handle on normal human variation through efforts such as the 1000 Genomes Project.
“This sequence and the analyses herein provide an initial step towards attaining information on population and individual genetic variation,” Wang and colleagues wrote, “and given the use and analysis of next-generation sequencing technology, constitute advancement towards the goal of providing personalized medicine.”