Chromosome-Scale Selective Sweeps and Genomic Diversity in C. elegans
Andersen, Gerke et al., Nature Genetics
Researchers at Princeton University and elsewhere discuss the effects of chromosome-scale selective sweeps on genomic diversity in Caenorhabditis elegans. Taking a high-throughput selective sequencing approach on a collection of 200 wild C. elegans strains, the team found that the nematode's "genome variation is dominated by a set of commonly shared haplotypes on four of its six chromosomes, each spanning many megabases." Further, the team reports on its population genetic modeling experiments, which showed that "this pattern was generated by chromosome-scale selective sweeps that have reduced variation worldwide; at least one of these sweeps probably occurred in the last few hundred years," it writes.
Fighting Disease with iPhones and Big Data
A startup iPhone app developer based in Bucharest, Romania, called Skin Scan has big plans to fight and track skin cancer. Skin Scan's app (also called Skin Scan) allows users to snap pictures of questionable moles or lesions which are then sent to Skin Scan's servers where a proprietary algorithm analyzes the picture. While the app will not provide an accurate diagnosis — yet — the algorithm will identify abnormalities and assign a rating for the abnormality from low-risk to high-risk and then refers users to local dermatologists.

Skin Scan is building an analytic database based on photographs and results from user, including location data in order to create a time-space map model based on the severity and frequency of lesions.
As skin cancer is best analyzed over time, this data may be useful to not only physicians, but government and academic researchers tracking cancer as well, assuming it can be sufficiently de-identified.
The app developer also has designs on connecting doctors and users to eliminate in-person office visits.
In discussions of personalized medicine, the concept that someday soon patients might walk around with their genomes in their pockets or on mobile devices is often batted around but the viability or execution is rarely explored. Technology developments such as Skin Scan could prove to be a good test case for connecting patients with physicians with personalized medical data in a way that integrates instantaneous communication and real-time data analysis with consumer electronic devices.
Cray Now Offering $200,000 Supercomputer
In effort to reach out to researchers with limited funding and a desire to own their own supercomputer, Cray is now offering a line of commodity supercomputers with a starting price tag of $200,000.
Cray's entry-level offering combines the software support previously only reserved for Cray CX1 and Cray CX1000 systems with the petascale capabilities of the Cray XE6m and Cray XK6m line. The $200,000 system also comes equipped with Cray's Gemini interconnect, the latest version of the Cray Linux Environment, powerful AMD Opteron 6200 Series processors, and GPUs.
"Cray's new entry-level configurations leverage its deep HPC technology portfolio to create purpose-built systems for the departmental technical computing market segment," said Earl Joseph, IDC program vice president for HPC. "This segment was worth around $3 billion in 2011 and IDC projects that it will grow at a healthy 7 percent to 8 percent CAGR through 2015."
The new "affordable" supercomputer is not really a full-fledged supercomputer per say but rather a blade server configuration that's essentially a baby XE6m configuration with six blades and 49 sockets using Opteron 6200s. The server rack is capable of 6.5 teraflops — which comes out to about $30,769 per teraflop.
These new entry-level supercomputers might be the perfect solution for researchers interested in developing code for larger-scale systems, such as Blue Waters at the National Center for Supercomputing Applications at the University of Illinois or the Titan supercomputer at Oak Ridge National Laboratory.
What it Takes to Get to Exascale
Science has an article discussing what it will take to make exascale computing a reality. These new systems — which at present remain only theoretically possible — would be capable of performing 10 to the 18th power floating point operations per second, or an exaflop.
Exascale supercomputers would be 100 times more powerful than today's fastest supercomputer, the K Computer at Japan's Riken institute, which is currently ranked at roughly 11.3 petaflops. All the major supercomputing powers are racing towards constructing a viable exascale system, including the US, China, Japan, Russia, India, and the EU.
However, the challenges of energy efficiency and sustained performance are formidable, not to mention developing brand new programming models for these huge systems.
Even though computer hardware has seen a steady increase in performance over the last few decades, when it come to actually achieving exascale performance, all those technological advances go out the window. Exascale won't simply be a matter of building a really, really large supercomputer center, crammed to the ceiling with the latest server blades, but rather, an entirely new processor and interconnect architecture.
Intel has released its 50-core Knights Corner and Xeon E5 server chips in an attempt to build up to exascale by the year 2018. These chips are designed for massive processor core counts as well as low energy consumption.
Sometimes the need for a completely new hardware to accommodate the perpetual growth in research data gets lost — folks still think the cloud can save them when, for example, genomics datasets reach the exascale mark. Unfortunately, an exascale cloud can't exist until there is exascale hardware to make it float.
API for Statistical Phylogenetics with HPC
Researchers at the University of Maryland have developed BEAGLE, an application programming interface and specialized library for high-performance statistical phylogenetic inference that allows existing software packages to make more effective use of available computer hardware including GPUs, CPUs with Streaming SIMD Extensions, and multi-core CPUs via OpenMP.
The team profiled their research in Systematic Biology and write that "a specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future."
BEAGLE is compatible with Mac, Windows, and Linux operating systems. It is freely available for download here.
Because the computational burden to search for epistasis in genome-wide association study data is often prohibitive, a team from the Roslin Institute at the University of Edinburgh has attempted a powerful and cheap implementation of a search algorithm on GPUs using OpenCL. The team published a paper in Bioinformatics describing the GPU implementation, which achieved a 92 speed up of an exhaustive epistasis scan for a quantitative phenotype.
In their paper, the authors write that "to achieve a comparable computational improvement without a graphics card would require a large compute-cluster, an option that is often financially non-viable. The implementation presented uses OpenCL—an open-source library designed to run on any commercially available GPU and on any operating system."
Their software, called EpiGPU, is open-source and GPU-vendor independent, meaning that it will run on any GPU card.
Amazon Rolls out NoSQL Database Service
Amazon Web Services (AWS) has launched a fully managed NoSQL database service in the cloud called DynamoDB that aims to provide seamless scalability on the fly. AWS is claiming that their new service will offload administrative tasks such as hardware provisioning, setup, configuration, replication, software patching, and cluster scaling.
According to their announcement, "developers can create a database table that can store and retrieve any amount of data, and serve any level of request traffic. DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent, fast performance. All data items are stored on Solid State Disks and are automatically replicated across multiple Availability Zones in a Region to provide built-in high availability and data durability."
Amazon's CTO Werner Vogels has a post on his blog discussing the announcement, where he describes DynamoDB as the result of 15 years of "learning" in the areas of large-scale non-relational databases and cloud computing. "Several years ago we published a paper on the details of Amazon’s Dynamo technology, which was one of the first non-relational databases developed at Amazon. The original Dynamo design was based on a core set of strong distributed systems principles resulting in an ultra-scalable and highly reliable database system."
With a NoSQL database there is no strict schema, so data is collapsed into one very fat table where each row stores a huge amount of data. The NoSQL database contains a lot of data redundancy, which means more storage space and computational power is required compared to SQL databases.
AWS might attract customers in genomics with this offering as there have already been several use cases of NoSQL in the cloud for omics research. For example, last October, Monsanto deployed Cloudant's NoSQL database as the foundation of their genomics data analysis system.
DynamoDB users can get started with a free tier account that enables 40 million of requests per month free of charge. Additional request capacity is priced at cost-efficiently hourly rates as low as $.01 per hour for 10 units of Write Capacity or 50 strongly consistent units of Read Capacity, with replicated solid state disk storage at $1 per GB per month.
GPU-Based Cluster Aids Nanocarrier Simulations
A team at the University of Illinois at Chicago are using both traditional and GPU-based clusters at the National Center for Supercomputing Application (NCSA) to study nanocarriers. Like an empty bullet casing, nanocarriers could prove to provide a targeted delivery method for drugs needed to kill cancer cells.
The NCSA's clusters enabled the researchers to perform extensive atomistic molecular dynamics simulations of polyethylene glycol (PEG)-ylated phospholipid dendron-based micelles — aggregates of surfactant molecules dispersed in a liquid colloid — in which the micelles are characterized in pure water and ionic solutions.
"Our simulations are massive," says principal investigator Petr Kral. "They have up to 750,000 atoms and they need to be calculated for a relatively long time, up to 30 nanoseconds. That is why the supercomputer was very useful to us and very necessary."
While Kral and his collaborators developed their own GPU-based computer system in their lab, it lacked the power for their simulations they run. Their results were published last year in the Journal of the American Chemical Society and Chemical Communications.
GPU-Accelerated Short Read Aligner
Researchers from the University of Cambridge and the University College Cork have released BarraCUDA, a GPU-accelerated short read DNA sequence alignment software based on BWA.
The team used Nvidia's Compute Unified Device Architecture (CUDA) to develop the software on a GPU. BarraCUDA demonstrated a throughput six times the speed of a CPU core for gapped alignment and even faster when gap opening is disabled.
They describe BarraCUDA in BMC Research Notes.
According to the team, when it comes to implementing alignment software, multiple GPUs scale better than CPUs. They write that "a normal computer can easily take up 4 GPUs, meaning that using this test library as an example, a single-end alignment can be done in 5 min, which is twice the speed of a high-end 12-core workstation. Using 8X GPU, we can achieve an alignment speed 3X faster than a traditional computing node with 12 CPU cores, making GPU nodes a more favourable option, in terms of HPC environment, than using those with CPUs."
BarraCUDA can be downloaded here.

In this video, we speak with Todd Smith, senior leader of research and applications of PerkinElmer, and researchers from Stockholm University on accelerating the GROMACS molecular dynamics software suite with GPUs and CPUs.
CLC Bio and Sciengines have announced a collaboration to offer users Sciengines' RIVYERA FPGA-based platform with BLASTp, BLASTn, and Smith-Waterman.
The RIVYERA hardware platform allows the BLAST implementation — which is still under development — on 128 FPGAs that can be set up in each compute unit. An early version of this solution will be showcased at the International Plant & Animal Genome (PAG XX) conference on January 14-18 in San Diego.
According to Jost Bissel, chief software architect at Sciengines, the initial results from running CLC's tuned BLASTp on their FPGA solution demonstrated a 188 times speed up using 64 FPGAs when compared to a Xeon core processor. "The benchmark ran BLASTp to align 920000 amino acids against a database of 1 billion amino acids. Similar acceleration has been achieved in early benchmark tests of the BLASTn version, and we expect both BLAST implementations to be accelerated even further before the final release," says Bissel.
Also at the PAG XX conference, Pico Computing will demonstrate their FPGA-accelerated BFAST solution which is running on its M-502 FPGA modules. According to their white paper, this BFAST implementation is 100 times faster than BFAST running in software and ten times faster than Bowtie. The FPGA system maps 92 percent of short reads versus 85 percent of reads mapped for Bowtie. This sensitivity can be further tuned in the FPGA system. In addition, Pico integrated their FPGA system with Geneious Pro's plugin API to create a visualization and analysis interface.
Timelogic, Mitrionics, Convey, and SGI, have also released FPGA BLAST implementations, but it's difficult to really compare FPGA BLAST solutions against each other — every offering out has a unique hardware configuration — but all FPGAs provide impressive results when compared to a CPU.
A Few Hundred Genomes in Your Pocket
Victorinox, makers of the world-famous Swiss Army Knife, have, surprisingly, been the first to offer a 1TB USB stick — the world's largest thumb drive. The USB stick comes with either just the thumb drive on its own or with a pair of scissors and a knife. The drive can be accessed via USB 2.0 and 3.0 or eSATA, has AES 256-bit ascription, and has a 48 x 96 dot monochrome LCD display with room enough to provide a device label or some indicator of the drive's contents.
The thumb drive, which was exhibited at this week's CES conference in Las Vegas, comes with a price tag of $2,000, so if you're the type of person who is apt to misplace their car keys often, you might want to skip this one.
In theory, you could stick roughly 340 human genomes on this drive — not including annotations and other data of course — which begs the question: Could snail mail as a data transfer method for research collaborations make a comeback? It would be a lot cheaper to send some USBs in a box compared with a crate of disk arrays or hard drives, and possibly quicker than uploading data to the cloud.
These USBs also sport some pretty formidable security. The drive immediately emails its owner if plugged into an unauthorized computer and if no reply is received from the owner, zaps the flash memory and deletes the data.

Small and Smart Storage Boxes for LIMS
While there are a lot of high-end storage systems marketed to large-scale sequencing operations, it's not every day that you come by specially designed desktop storage units natively hosting scientific data management software for the individual researcher or small lab. BioTeam has hacked a few Drobo storage arrays to embed their own MiniLIMS software directly inside the array. This isn't a product that the BioTeam is offering, but it's an interesting example of successful hacking for research.
Getting LIMS to run as an application inside a storage array — a desktop-sized storage array — is worth taking a look at because, according to BioTeam, in the near future it's not inconceivable that such "smart storage" devices could replace PC-based laboratory instrument operating workstations.
"As storage units get smarter and more capable the need for a dedicated Windows PC attached to an instrument or Genome Sequencer becomes less important….Something like this seems attractive for single-instrument genomics environments or labs where dedicated research IT staff may not be easily available," they wrote on their blog.
The little storage box that could:

Cost-Conscious Cloud-App Development
Thanks to IBM, you might be hearing a lot more about a cloud computing development platform called Green Hat in the future.
Green Hat, which has actually been in business since 1996, allows software developers to kick the tires and work out the kinks on their cloud software before it actually gets to the cloud.
IBM has purchased the company in order to add its technology to their Rational Software development platform.
A virtual environment simulates a wide range of IT infrastructure configurations and headaches, thereby allowing an institution to bypass painful parts of the software development process. Coding for the cloud can cost money and definitely eats up hours in the lab, even for an experienced programming with an Amazon AWS account.
Getting popular bioinformatics software applications — typically run on clusters or workstations — to operate smoothly on the cloud is still anything but trivial, so a testing environment that allows for some growing pains could prevent lots of frustration and wasted grant money.