Skip to main content
Premium Trial:

Request an Annual Quote

A Virtual World


Last month we looked at cloud computing, a compute architecture that provides users with access to large amounts of on-demand computing power through an Internet connection. Amazon, Google, IBM, and a growing number of other vendors are hosting compute clouds where users can go online, open an account, and for mere cents an hour, create virtual compute clusters complete with the desired CPU type and speed, RAM, and memory capacity.

Compute clouds are made possible through virtual machine, or VM, software, which allows users to create multiple virtual computers that function completely independently on one physical machine. There are many benefits to employing VMs, and in the case of cloud computing, it gives the cloud provider more bang for the buck because a single server can host numerous virtual compute nodes simultaneously, thereby increasing optimization of resources and saving energy and maintenance costs.

VM software also has a very practical and increasingly popular application for desktop users who desire the ability to run what would normally be incompatible software in their preferred operating system environment. For example, Windows users can run a Linux distribution simply by installing it on their virtual machine software platform of choice, and run Linux applications seamlessly alongside their native Windows OS. In most cases, the VM appears as any other open application, but in reality, it is an independent, virtual computer with its own operating system running next to the host computer's operating system.

But VMs are not only useful for those looking to reach across the compatibility divide. In addition to cross-operating system application access, this technology also comes in handy if there are certain programs that may only run on older versions — or, in the case of Linux, different distributions — of your current operating system. This is especially the case when you come across a must-have application but you're not ready to install an entirely new operating system just to complete a few tasks.

Biofx on the bandwagon

In addition to VM-savvy desktop users, this technology is also finding favor with bioinformatics software developers, says Michael Brudno, an assistant professor of computational biology at the University of Toronto. Brudno believes that it may actually be more reliable and cheaper to bring the computers to these ballooning databases, rather than the other way around; operating systems, after all, are smaller and therefore easier to move over a network connection than enormous datasets. Following this logic, the machines should be in a location where it makes the most sense to do the computation, says Brudno. "A virtual machine is just like a laptop — you can suspend and resume it, just like when you close or open your laptop," he says. "But instead of carrying it with you, you send your VM to another site, where it populates to other compute nodes."

The concept of moving a virtual computer from one physical piece of hardware to another is called migration. "The way it works is that you have a running VM on your home machine and then you migrate the whole running machine from your own site to a remote site, so the result is that you still have network connections to the original place where it came from," Brudno says.

The trick with genomics data has always been processing in parallel, so merely migrating a single VM doesn't cut the compute mustard. Brudno's solution is Snowflock, a VM software implementation specifically geared toward parallel computation over a cluster. Like other VM software solutions, Snowflock allows users to migrate a single VM over a network connection. Its usefulness to bioinformatics resides in its ability to clone a user's VM over an entire cluster. This is achieved through an almost instantaneous cloning mechanism that creates exact replicas of the original VM sent by the remote user and then populates each clone throughout the cluster.

"We are also looking at this as a solution to multiple laboratories with different needs sharing one compute cluster," Brudno says. VMs can allow for secure, shared usage of a cluster or server farm because access by the system administrator need only be granted to the user for an allotted virtual machine account. For example, if one biological research group wants to share its cluster resources and datasets with another, the guest would merely be granted a password to migrate virtual machines onto the cluster instead of granting direct access to the cluster. If security is breached and passwords are compromised, the hacker only has access to the guest VMs and not the entire cluster.

VM apps

VMs are also good for those intimidated by the world of Linux, or those still in the learning stage with bioinformatics applications. A suite of bioinformatics tools packaged in a virtual machine distribution called DNALinux Virtual Desktop Edition offers Windows users the chance to have at their favorite applications, such as HMMER, Bioperl, NCBI Blast, and many others, all contained within a pre-packaged virtual machine distribution based on Xubuntu, an alternative version of the Ubuntu Linux distribution. DNALinux uses a virtual machine platform by VMWare, currently one of the more popular VM vendors. DNALinux developer Sebastian Bassi, project leader at the National University of Quilmes in Buenos Aires, says that while there are many live CD bioinformatics software packages available, a VM implementation allows users to save their settings and data. Bassi also says that the target
audience for DNALinux is the greenhorn bioinformatics software user who is either not familiar with Linux or does not want to install Linux on his or her own desktop. Users can download DNALinux for free from VMWare's official site.

VMs also allow users to put the machine and data together in one self-contained package. Such is the case with WormBase, the central database for C. elegans and other nematodes hosted by Cold Spring Harbor Laboratory. According to Todd Harris, project manager of Wormbase, virtual machines solve problems associated with the creation of efficient backups, referential access, and most importantly, easy distribution. Databases like WormBase usually consist of multiple database backends, complicated visual display layers, and a slew of third-party software libraries, all joining together to create the end-user experience. "Any backup strategy needs to capture the underlying databases but also the presentation layer and hosting environment, so simply mirroring the resource onto additional hardware is prohibitively expensive and a maintenance headache," Harris says. "Virtual machines overcome backup challenges by encapsulating all software and databases along with a pre-configured operating system in a single convenient package."

The reason for creating a VM distribution of a Web-based database is that many users like the option of running the resource on their own machine, free of network congestion or server load, or even the need to be online, says Harris. For biotech companies and pharmas involved in expensive R&D projects, VM distribution allows for a certain level of privacy when browsing the database. Harris acknowledges that while there was a small amount of salesmanship involved in getting biologists and other researchers comfortable with the idea of downloading a VM implementation of WormBase onto their desktops, the results have been positive, with a lot of interest coming from users who want to set up their own clusters to support their labs. "A lot people in the bioinformatics world were not even aware of virtualization, but all you have to do is tell them it's just like opening up a Word document and they get it," Harris says.

The Scan

Y Chromosome Study Reveals Details on Timing of Human Settlement in Americas

A Y chromosome-based analysis suggests South America may have first been settled more than 18,000 years ago, according to a new PLOS One study.

New Insights Into TP53-Driven Cancer

Researchers examine in Nature how TP53 mutations arise and spark tumor development.

Mapping Single-Cell Genomic, Transcriptomic Landscapes of Colorectal Cancer

In Genome Medicine, researchers present a map of single-cell genomic and transcriptomic landscapes of primary and metastatic colorectal cancer.

Expanded Genetic Testing Uncovers Hereditary Cancer Risk in Significant Subset of Cancer Patients

In Genome Medicine, researchers found pathogenic or likely pathogenic hereditary cancer risk variants in close to 17 percent of the 17,523 patients profiled with expanded germline genetic testing.