An international team of researchers reports on the CLARITY Challenge to determine best practices for using genomic tools to diagnose genetic disease in Genome Biology. Thirty international groups analyzed DNA samples from three families afflicted with heritable genetic disorders to identify the disease-causing variants and report their results in a clinically useful way. While the team says that there was a broad consensus on the approach to analyze and interpret the data, only two groups identified the consensus candidate gene variant for all three families. "There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups," the researchers say. A panel deemed Brigham and Women's Hospital to be the winner of the contest, followed by groups at Genomatix, CeGaT, University Hospital of Bonn, and the University of Iowa.
Also in Genome Biology, researchers from the University of California, Davis, the University of Maryland, and Johns Hopkins University report on their approach to sequence and assembly the loblolly pine genome, the largest genome sequenced to date. To sequence the tree, the researchers turned to a combination of haploid DNA from single pine seed megagametophyte and diploid DNA from needle tissue. Then to assemble the genome, they used an approach that combined a k-mer and overlap layout consensus assembly methods. " Our combined strategy resulted in the most complete and contiguous conifer (gymnosperm) genome sequenced and assembled to date with an assembled reference sequence consisting of 20.1 billion base pairs contained in scaffolds spanning 22.18 billion base pairs," the investigators write.
GenomeWeb Daily News covers this, and related, work here.
Wellcome Trust Sanger Institute and University of Oxford researchers present their evaluation of scaffolding tools for next-generation sequencing data, finding variation in the results from the tools. They examined 10 tools, including Bambus2, GRASS, and SOPRA, among others, and the scaffolding modules from the ABySS, SGA, and SOAPdenovo2 assemblers on sets of simulated and real data. "Generally, the software performed very well on simulated data, with many runs producing perfect scaffolds. The difficulties arose when real libraries and genomes were used," the researchers note. Overall, they say SOPRA seems "to strike the best balance between aggressively making joins, with a reasonably low error rate." Other tools, they add, may be suitable if the researchers are more concerned with speed or minimizing errors.