Exome Analysis on the Cloud

By Matthew Dublin

A post over at Blue Collar Bioinformatics describes a distributed exome analysis pipeline that allows users to run best-practice software alongside customized code in the cloud. All the components in the pipeline are open source and supported by large user communities. The setup uses CloudMan, a dynamically scalable version of Galaxy, as a platform to build a full SGE cluster environment and CloudBioLinux, a Linux bioinformatics software package. Communication between cluster nodes is facilitated by RabbitMQ with an automated pipeline written in Python that organizes parallel processing across the cluster.

The post has a few videos to explain the process, starting with establishing a cloud cluster on Amazon Web Services EC2 servers by following the CloudMan setup instructions:

After you've booted up your cloud, move over to the CloudMan web interface on the server and start up an instance using this shared identifier:

cm-0011923649e9271f17c4f83ba6846db0/shared/2011-08-19--21-00

Head over to the post to find more instructions on configuring RabbitMQ messaging to communicate between the nodes, running analysis with FASTQ input files, monitoring the running process, and retrieving results.