NFS & GlusterFS on Amazon's Compute Cluster

By Matthew Dublin

The BioTeam is continuing their efforts to test out Amazon's new cc1.4xlarge compute cluster EC2 instances, a version of EC2 that is specifically designed for high-performance computing. In their latest benchmark tests, the team compare two network file systems, NFS and parallel GlusterFS, in several single-client tests using 900GB ephemeral disks with the goal of developing some methods and techniques to take advantage of the compute cluster for their clients. NFS, or Network File System, is a file system originally developed by Sun Microsystems in 1984, and is most often used with Unix operating systems. GlusterFS is a distributed file system that is used for cloud computing architectures, archival storage, and biomedical data storage. Because these two systems are being tested against a single client only, they don't say anything about what typical jobs on the compute cluster would look like in terms of performance. They do, however, describe the installation and two-disk RAID0 setup experience, including ephemeral disk formatting and replication with wallclock time. Click here for the benchmark results with NFS, GLusterFS, and local access.