Skip to main content
Premium Trial:

Request an Annual Quote

Pevzner argues for easier assembly

Premium

DNA fragment assembly doesn’t have to be so difficult, says Pavel Pevzner, a computer science professor at University of California, San Diego. He’s been pushing a mathematical solution called the Eulerian method instead of the traditional Hamilton approach, and he’s starting to win converts.

Pevzner sees fragments as groups of edges and vertices, and he says that the current overlap method for assembly requires matching one vertex to another — a process that “simply wouldn’t work for very long sequences,” he argues. The result: more and shorter contigs, which means that the brunt of the assembly work falls to the hundreds of finishers toiling in major sequencing centers.

Instead of thinking of assembly as a traveling salesman problem (visiting each vertex once), Pevzner believes people should instead look at it as a path traversing each edge once. It’s a subtle distinction. But computationally, it makes all the difference in the world. “People have been trying to solve the traveling salesman problem for 50 years now and probably never will solve it,” Pevzner says. The edge method, on the other hand, borrows from the 18th-century work of mathematician Leonhard Euler, and there are already computational algorithms for problems similar to fragment assembly. The best part: with the Euler process, Pevzner says, “repeats aren’t enemies.”

Bringing this concept into genomics isn’t a new idea. “People knew that the edge version of this problem is much easier computationally,” he remarks. “But [they] didn’t know how to apply this to fragment assembly.” He contends that the latest work with Eulerian superpath problems has proven successful with assembly — more successful than conventional methods. In fact, he’s working with UCSC’s Jim Kent and researchers at Washington University to reassemble some of the more difficult BACs for improved accuracy in draft sequences.

Jim Mullikin, who’s in charge of fragment assembly at the Sanger Centre, says after hearing Pevzner talk about the superiority of Euler, “I had to prove it to myself whether it was really something worthwhile or not.” After a day of quick programming, he wound up with encouraging results. Though he hasn’t implemented the full set of Pevzner’s code, he has incorporated the Eulerian process into Sanger’s whole-genome shotgun assembly. “It does produce very good quality consensus sequence,” he says.

— Meredith Salisbury

 

The Scan

Researchers Map Recombination in Khoe-San Population

With whole-genome sequences for dozens of individuals from the Nama population, researchers saw in Genome Biology fine-scale recombination patterns that clustered outside of other populations.

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.

TB Resistance Insights Gleaned From Genome Sequence, Antimicrobial Response Assays

Researchers in PLOS Biology explore M. tuberculosis resistance with a combination of sequencing and assays looking at the minimum inhibitory concentrations of 13 drugs.

Mendelian Disease Genes Prioritized Using Tissue-Specific Expression Clues

Mendelian gene candidates could be flagged for further functional analyses based on tissue-specific transcriptome and proteome profiles, a new Journal of Human Genetics paper says.