NEW YORK (GenomeWeb News) – A Korean-led research team reported online in Nature Genetics this weekend that it has sequenced the genomes, exomes, and transcriptomes of several individuals from that country.
The researchers sequenced 10 whole-genomes, eight whole-exomes, and 17 transcriptomes, identifying millions of SNPs, many of them not reported before. The analyses also uncovered a host of small insertions and deletions, thousands of new transcripts, and more than 1,800 sites at which DNA and RNA sequences from the same individual didn't correspond to one another.
"Our findings suggest that a considerable number of unexplored genomic variants still remain to be identified in the human genome," corresponding author Jeong-Sun Seo, a researcher affiliated with Seoul National University, Macrogen Inc., Psoma Therapeutics, and Axeq Technologies, and co-authors wrote. He added "that the integrated analysis of genome and transcriptome sequencing is powerful for understanding the diversity and functional aspects of human genomic variants."
Individual sequencing projects and large-scale efforts such as the 1000 Genomes Project have come a long way in characterizing the genetic variation in humans, the study authors noted.
Still, they argued, while millions of SNPs and tens of thousands of structural variants have been catalogued so far, more work is needed to find and interpret the genetic variants underlying many human traits and diseases across human populations. And, they noted, bringing together both genome and transcriptome information on the same individuals is one strategy for getting a more refined view of genetic variants and their consequences.
"[T]o gain a more detailed understanding of the genomic landscape of common and rare variants in humans, more individuals from different populations should be whole-genome sequenced at high-depth coverage," they wrote, adding that "comparisons between genomic variants and their corresponding transcriptional profiles from the same individuals need to be performed to help understand the functional aspects of these variants."
To begin doing this in the Korean populations, the researchers used the Illumina GAIIx and Life Technologies SOLiD platforms to sequence the genomes of five men and five women from Korea to an average of around 26 times coverage using DNA from blood samples.
For another six Korean men and two women, they captured protein, microRNA, and other non-coding RNA sequences with the Agilent SureSelect Human All Exon Kit before sequencing the exomes to an average depth of nearly 64 times.
During their subsequent analyses, the researchers detected 35,740 SNPs in the exome sequences and 8.37 million SNPs in the genomes, including roughly 1.83 million that hadn't been reported in the past. Almost three-quarters of the newly undetected SNPs seem to be quite rare, turning up in just one of the 10 genomes sequenced, they explained, while the rest appear to be more common within the Korean population.
Overall, each genome contained between 3.45 million and 3.73 million SNPs, including around 8,431 non-synonymous SNPs, on average, they reported.
When they combined their genome and transcriptome data, the team found a set of 86 so-called "super nsSNP" genes that were more likely to harbor non-synonymous SNPs than typical genes in the Korean individuals tested. Of these, more than half — 57 percent — were genes linked to either sensory or immune-related processes. And, researchers noted, many fell in parts of the genome that are known to be copy number variable.
The 10 Korean genomes also contained nearly 1.2 million small insertions and deletions, and nearly 5,500 large deletions in 1,348 regions, while 15,697 indels turned up in the eight exomes. Almost a third of the indels from the whole-genome sequences were not housed in the dbSNP database.
Based on the linkage disequilibrium patterns between the rare and common variants detected in the genomes and exomes, the researchers argue that many of the SNPs on existing microarrays would likely miss certain variants in the Korean population.
"[O]ur findings suggest that a substantial number of Korean common functional variants may not be tagged well by neighboring 'tagging' SNPs on microarrays," they explained. "These results suggest that many association studies may have fundamental limitations, especially for populations that were not included in the initial [linkage disequilibrium] assessments on the human genome."
When the researchers used Illumina paired-end sequencing to sequence messenger RNA from lymphoblast cell lines generated from blood samples for 17 of the individuals, they found 4,414 transcripts that hadn't been annotated in the past but which turned up in at least two of the newly sequenced transcriptomes. Of these, 111 were found in all 15 individuals included in the researchers' experimental data set.
And by bringing together their genome and transcriptome data, the team uncovered a slew of sites at which RNA sequences in transcripts didn't jive with the DNA sequences coding for these transcripts.
Of the 1,809 of these so-called "transcriptional base modifications" that they detected, 188 of the TBMs occurred within protein-coding genes and another 1,621 were found in untranslated regions in the genome, hinting that these TBMs may be important to consider for those doing disease and other studies.
"The TBMs may affect the susceptibility of complex diseases because they are likely to modify mRNA stability and to change amino acids of protein sequence," the researchers concluded. "A combination of deeper genome and transcriptome sequencing of a variety of tissues from more individuals, including those clinically affected, will be necessary to assess the complete profile and the functional impacts of TBMs."