Graham Ruby says it all started with a "bit of a metagenomics problem." As a postdoc in Joseph DeRisi's lab at the University of California, San Francisco, Ruby was working to distinguish novel viral sequences in clinical mucus samples from infants with idiopathic respiratory illnesses.
"Metagenomic data sets are typically pretty challenging because they're not a genome — they're a mixture of many different genomes, each of which is present at a different level of coverage," Ruby says. "For some parts of the data you might have barely enough sequence to be able to put together the genome sequence," while for others in a complex sample, "you may have hundreds or a thousand-fold the amount of sequence you would need in order to put together the genome sequence. What I really needed was something that could accommodate both of those cases simultaneously."
Ruby then created the paired-read iterative contig extension — or PRICE — Assembler, which DeRisi announced would be made available to the public by year's end at the Genetics Society of America meeting this past June.
Ruby says that PRICE works by quantifying paired-end sequences to obtain a sense of what the level of coverage is for a particular genome; in this way, PRICE can dynamically scale its assembly requirements to cater to the level of coverage provided by each of the different genomes in a sample.
"PRICE subdivides the problem of assembly into these subtasks, where for some pieces you might have very high coverage [and] for some of the subtasks you might have very low coverage," Ruby says. That way, once the sample is broken down and scaled, "I only have to worry about the complexity of my small data set that's a subdivision of the total task," he adds.
The concept of scaling down assembly operation into manageable undertakings will not only make whole-genome assembly more accessible to "ordinary molecular biologists," Ruby says, but it also allows the process to be "handled by normal computers."