By Monica Heger
This story was originally published on May 20.
In order to overcome the challenges of using next-gen sequencing for de novo assemblies, researchers from the University of Washington demonstrated that it is possible to assemble a primate exome by aligning it to the human reference genome.
"It's feasible to sequence primate genomes, but assembly with short reads is still difficult," Renee George, a postdoctoral student in Willie Swanson's laboratory at the University of Washington, said in a presentation at this month's annual Biology of Genomes meeting in Cold Spring Harbor, NY.
To learn whether they could sequence and assemble a primate exome by aligning it to the human reference, the researchers first tested the method on the macaque monkey, which does have a reference genome. Their aim was to see if it is possible to capture primate exomes with human probes and align and assemble the resulting reads against a human reference genome.
Using Nimblegen's SeqCap EZ Exome in-solution kit, they used an Illumina Genome Analyzer to capture and sequence the exome to an average of 60-fold coverage with 76 base-paired end reads. They found good correlation after comparing the read depth of the macaque against two human controls.
"The exons that are captured well in humans are captured well in macaque," George said. Additionally, although read depths declined in the regions of the macaque genome that are more divergent from the human genome, they still remained well-covered.
The team verified the method by showing that the sequenced macaque exome was comparable to the coding regions of the macaque reference genome.
It next applied the method to two Old World monkeys — the colobus and vervet — and one New World monkey, the tamarin. None of the animals has a reference genome.
They used the human genome as a reference. They used that genome instead of the macaque's because the New World monkeys are as divergent from the macaque as they are from the human, and the human genome is "more complete and higher quality," George said.
Each species was sequenced to an average of 80- to 90-fold. Capture efficiency for each species was very high; between 95 percent and 97 percent of the target was covered by at least one read, compared to 98 percent for a human control.
Similar to those in the macaque, the regions that captured well in humans also captured well in the New and Old World monkeys.
In order to assemble the reads, the team used a reference-guided assembly approach, which is a hybrid between a mapping approach and a de novo assembly. They then mapped the reads back to the human genome and assembled contigs for each cluster of reads.
"We use the mapping to get the reads in the right location, but also let the reads speak for themselves," George explained. The assembly "roughly corresponds to one contig per exon."
To assess the quality of the assembly, they mapped the contigs back to the human reference genome to determine what percentage of bases were accurately called.
First they used a filtering process to discard low-quality sequences, masked the low-quality differences and indels that were found relative to the human reference, and masked assembly errors by mapping the reads back to the contigs and removing assembly errors.
They also removed targets with excess heterozygosity and filtered out potential paralogous assemblies by masking segmental duplications from the human, chimp, and orangutan genomes.
For each monkey, the percentage of bases deemed high quality was greater than 88 percent, compared to 94.5 percent for the human exome assembly.
However, in order to figure out if the assembly approach itself was able to produce accurate results, they compared the macaque exome assembly to the macaque reference genome.
[ pagebreak ]
George said they achieved a 99.9-percent target identity, and a nucleotide diversity of 0.12, suggesting that the differences are due to polymorphisms between the two individuals rather than sequence or assembly errors, and that the method works well.
'Doubles Previous Estimates'
With high-quality assemblies, George said the team could now ask biologically interesting questions about evolution. In particular, they looked for genes under positive selection.
Combining the exome assemblies with the human and macaque reference genomes, they looked at 15,000 genes and identified 157 with strong evidence for positive selection.
That "doubles previous estimates," George said.
Specifically, they found statistically enriched genes in defense, sensory perception of chemical stimulus, and keratinization, which is the ability to convert squamous epithelial cells into rougher material.
While other researchers have previously found positive selection to occur in genes related to defense and sensory perception, keratinization has not previously been identified as a gene class that has undergone positive selection.
"This is the first time we've seen such an overrepresentation of the category," George said. "It implies that there's selection for setting up physical barriers between the body and outside world."
The team identified six genes related to keratinization that show significant evidence of positive selection: SPRR2E, SPRR2F, IVL, LCE3C, SPRR2B, and EVPL. One of these, IVL, shows evidence of recent selection in humans.
Additionally, they found differences between Old and New World monkeys, including evidence of pseudogenes and deletions in the former group that are not present in the latter group or in humans.
For example, said George, in the macaque and Old World monkeys a pseudogene appears as a result of a premature stop codon in the SNTN gene, which is involved in the structure of cilia. Also, a gene involved in microbial and viral immunity, the GBP5 gene is completely deleted in the Old World monkeys.
George said they are continuing to evaluate their results, in particularly by following up with the genes involved with keratinzation, and IVL in particular, since it was the most recent of which to evolve.
Though her team's results shows it is possible to capture and sequence non-human primate exomes with human probes, George said "the question still remains how far we can push this technology."
They have already demonstrated that it is feasible in species with up to 7-percent divergence. George said the next species her team is considering is the lemur, which averages about 7-percent divergence from humans, but also contains regions with up to 12-percent divergence.
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.