Faster, Faster

University of Chicago researchers report using a supercomputer to speed up genome analysis.

Full-text access for registered users only. Existing users login here.
New to GenomeWeb? Register here quickly for free access.

The graph they use shows that

The graph they use shows that data extraction takes the most time. Is that correct? I thought alignment would take longer.

mgollery - despite all the

mgollery - despite all the attention on alignment it's usually other steps that take more time - variant calling, annotation, writing out files. That's why Bina quotes 3 whole genomes/day on their Box, unless you decide to do silly things like write out the BAM files for archiving, in which case throughput drops dramatically. File I/O (and data movement in general) is a big bottleneck, that's why HPC systems at some genome centers use parallel filesystems like Lustre, as does Beagle in this paper. Of course if GATK 3 is anywhere near the 720x faster (as was presented at AGBT last week) running on 24 cores of standard Intel Xeon using AVX then who'll need a supercomputer? or for that matter a Bina Box...