Skip to main content
Premium Trial:

Request an Annual Quote

Gorilla Genome Updated With PacBio Long Read Assembly

NEW YORK (GenomeWeb) – Researchers from the University of Washington, the University of California at Santa Cruz, Washington University, and Pacific Biosciences have put together an updated gorilla genome containing long sequence reads generated by PacBio single-molecule real-time sequencing.

The team used PacBio SMRT sequencing to take another look at the genome of the western lowland gorilla, Gorilla gorilla gorilla, using DNA from a female representative known as Susie. The first gorilla genome, which was generated with short read and Sanger sequencing using DNA from a female western lowland gorilla named Kamilah, was first reported in Nature in 2012.

As they reported online today in Science, the researchers were able to fill in most of the missing reference exon sequences that were believed to exist in the original assembly, while increasing the overall contiguity of the genome assembly, which contains fewer — and longer — contigs.

"As medical researchers, if we depend only on short read sequences, there's a chink in our armor," senior author Evan Eichler, a genome sciences researcher at the University of Washington, said in a statement. "The work on gorilla and other human genomes clearly demonstrates that large swathes of genetic variation can't be understood with the short sequence-read approaches. Long read sequencing is allowing us to access a new levels of genetic variation that were previously inaccessible."

As with earlier versions of the gorilla genome, the updated "Susie3" assembly is expected to serve as resource for understanding gorilla biology, primate evolution, and the underpinnings of traits that differ between humans and other primates, such as speech, disease susceptibility, and so on.

Using PacBio RS SMRT shotgun sequencing and P6-C4 chemistry, the researchers generated nearly 75-fold coverage of Susie's genomic DNA before doing de novo assembly and error correction with the PacBio FALCON assembler and an existing algorithm called Quiver.

By using long reads and putting the assembly together de novo rather than simply aligning the sequences to the human reference genome, the team was able to fill in sequence gaps and get a more detailed look at structural variants, repeat sequences, and retrotransposons.

The researchers still ran into some headaches in heterochromatic regions and stretches of sequence containing tricky segmental duplications, which tended to contain shorter assembly contigs. But the overall assembly appeared to be far more contiguous and complete than the original reference assembly gorGor3.

Whereas the largest contig in the original gorilla genome assembly reached just shy of 192,000 bases, for example, the latest assembly reportedly contained contigs stretching out more than 36.2 million bases. The number of contigs dipped from almost 464,900 down to fewer than 16,100 in the Susie3 assembly.

Overall, the new genome was only slightly larger once contigs were arranged in the genome, coming in at nearly 2.9 billion bases compared with just over 2.7 billion bases. Even so, the team was able to fill in or shrink more than 90 percent of the almost 434,000 sequence gaps that were estimated to exist in the gorGor3 assembly, retrieving some 87 percent of the reference exons that seemed to be missed in gorGor3.

In their analyses of the genome so far, the researchers have gotten new clues to gorilla population history, a more detailed annotation of genes and regulatory elements in the gorilla genome, and insights into sequences that have divergence in humans and gorillas, including those coding for components of sensory perception, skin keratin, immunity, metabolism, and other pathways.

The team noted that other evolutionary and biological clues will likely come as other draft genome assemblies — including the chimpanzee genome — are updated, improved, and more fully annotated.

"The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome," Eichler and his colleagues wrote.