The Assemblathon 2 results are out in GigaScience, and Erika Check Hayden at the Nature News Blog writes that they have lead to some "soul-searching."
For the competition, 21 teams submitted assemblies of three vertebrate species genomes to try to determine the quality of those assembled genomes. The genomes — from a budgerigar (Melopsittacus undulatus), a Lake Malawi cichlid (Maylandia zebra or Metriaclima zebra), and a boa constrictor (Boa constrictor constrictor) — were sequenced using Illumina, Roche 454, or Pacific Biosciences platforms. The quality of submitted assemblies was then judged by 10 metrics, including NG50 scaffold or contig length, the number of core genes mapped, and REAPR summary score.
"Overall, we find that while many assemblers perform well when looking at a single metric, very few assemblers perform consistently when measured by a set of metrics that assess different aspects of an assembly's quality," the researchers led by Ian Korf at the University of California, Davis, Genome Center write in GigaScience. "Furthermore, we find that assemblers that work well with data from one species may not necessarily work as well with others,"
This, Check Hayden says, has lead to discussions of the reliability of genome assemblies.
"[A]s a field, we have pretended that genome assembly is a reliable exercise and that the results can be trusted; the Assemblathon 2 paper shows that that's wrong," C. Titus Brown wrote at his blog, Living in an Ivory Basement, when the paper went up on a preprint server.
Keith Bradnam, the first author of the paper, she notes, compares choosing the best assembler to deciding what the best pizza is. "[T]he notion of a 'best' pizza is highly subjective and the best pizza for one person is almost certainly not going to be the best pizza for someone else," Bradnam wrotes in a guest post at Haldane's Sieve, adding that "[w]hat is true for 'making pizzas' is also largely true for 'making genome assemblies.'"