DNA fragment assembly doesn’t have to be so difficult, says Pavel Pevzner, a computer science professor at University of California, San Diego. He’s been pushing a mathematical solution called the Eulerian method instead of the traditional Hamilton approach, and he’s starting to win converts.
Pevzner sees fragments as groups of edges and vertices, and he says that the current overlap method for assembly requires matching one vertex to another — a process that “simply wouldn’t work for very long sequences,” he argues. The result: more and shorter contigs, which means that the brunt of the assembly work falls to the hundreds of finishers toiling in major sequencing centers.
Instead of thinking of assembly as a traveling salesman problem (visiting each vertex once), Pevzner believes people should instead look at it as a path traversing each edge once. It’s a subtle distinction. But computationally, it makes all the difference in the world. “People have been trying to solve the traveling salesman problem for 50 years now and probably never will solve it,” Pevzner says. The edge method, on the other hand, borrows from the 18th-century work of mathematician Leonhard Euler, and there are already computational algorithms for problems similar to fragment assembly. The best part: with the Euler process, Pevzner says, “repeats aren’t enemies.”
Bringing this concept into genomics isn’t a new idea. “People knew that the edge version of this problem is much easier computationally,” he remarks. “But [they] didn’t know how to apply this to fragment assembly.” He contends that the latest work with Eulerian superpath problems has proven successful with assembly — more successful than conventional methods. In fact, he’s working with UCSC’s Jim Kent and researchers at Washington University to reassemble some of the more difficult BACs for improved accuracy in draft sequences.
Jim Mullikin, who’s in charge of fragment assembly at the Sanger Centre, says after hearing Pevzner talk about the superiority of Euler, “I had to prove it to myself whether it was really something worthwhile or not.” After a day of quick programming, he wound up with encouraging results. Though he hasn’t implemented the full set of Pevzner’s code, he has incorporated the Eulerian process into Sanger’s whole-genome shotgun assembly. “It does produce very good quality consensus sequence,” he says.
— Meredith Salisbury