Researchers at Baylor College of Medicine have launched an R-based tool that they claim detects copy number changes in short-read data more accurately and faster than existing packages.
In a paper published recently in PLoS ONE, the investigators write that the tool, called readDepth, detects copy changes in sequences “by measuring the depth of coverage obtained by massively parallel sequencing of the genome” and that, unlike other methods, it does not require users to have a sequenced reference sample for comparison.
ReadDepth uses a negative binomial statistical model that "reduces the number of false positives identified and “adjust[s] for a number of types of bias, including GC-content, mapability, and other sources of distortion introduced by the preparation and sequencing processes,” the researchers wrote.
Furthermore, readDepth uses a multi-core architecture to parallelize data processing, which offers a speed advantage over analogous methods. When benchmarked against CNV-Seq, a similar R-based package for variant detection, readDepth was able to make variant calls in 231 seconds compared to 1,651 seconds for CNV-Seq and showed “considerably higher sensitivity and specificity.”
As part of their future plans, the developers plan to incorporate additional features into readDepth that will allow improved visualization of results and methods to detect overdispersion automatically.
This week, BioInform spoke with Aleksandar Milosavljevic, an associate professor of molecular and human genetics at Baylor and one of the co-authors on the paper. Milosavljevic developed readDepth in conjunction with Chris Miller, a former graduate student at Baylor who is now a staff scientist at Washington University in St. Louis. Below is an edited version of the conversation.
Let’s start off with some background on the study.
We [wanted] a technology that could [identify] structural aberration data from massively parallel sequencing reads comprehensively and accurately. We also could not find a technology that would allow us to [identify copy changes] not from just genomic sequencing reads but from bisulfite-treated reads — which are produced in order to get both genomic and epigenomic information — in the same shot.
Our goal is to have a tool that will allow us to do comprehensive genomic and epigenomic characterization from a single experiment and at the same time allow us to detect not only copy number changes but also integrate breakpoint detection using paired end data.
Our goal here is comprehensive genomic, epigenomic characterization from whole genome bisulfite treated data so that we can get comprehensive genomic characterization including mutations at every level from base pair to copy number to other structural variants and at the same time be able to read the epigenome which is the methlyation mark of the DNA.
ReadDepth is one of the tools that we have developed to reach that goal. We are developing other tools that are complementary, for example the PASH 3.0 program, which [we] published three months ago; and a platform to integrate the tools called Genboree.
There seem to be quite a number of open source short-read tools currently available. Why are they insufficient for your needs? What does readDepth do differently?
The issue is statistical modeling. Other tools assume a Poisson distribution of reads with a single parameter but it turns out that that parameters [change] along the genome [because] so we had to use another statistical model called negative binomial in order to model essentially two levels of variation. One is randomness of sampling of the genomic reads and the second is the variation in the parameters of the Poisson distribution. By having a better statistical model, we could then have more accurate detector.
The second aspect is that we validated this technology not only on reads but also on bisufite-treated reads which have different mapability performance. We also integrated information from paired-end reads. So we are not only calling copy number gains and losses but we can then validate the breakpoints which may indicate that the copy number gains and losses are due to other structural rearrangements in the genome.
Did you compare readDepth to specific tools?
[We compared it] with the tools that do the Poisson modeling and the references are in the paper. There is one tool published in 2009 in Nature Methods; CNV-Seq published in BMC Bioinformatics in 2009; and then one published in 2010 in BMC Genomics. These are prior tools that didn’t meet our requirements in one way or another.
What are next steps for you?
The next step is the integration of the software with other tools that can do comprehensive genomic and epigenomic characterization. For that purpose we are using the Genboree system and one particular tool we are integrating this with is the PASH program which can get base pair level and indel variants from bisulfite-treated reads.
These two tools are complementary in that PASH does the mapping and inference of smaller scale rearrangements from bisulfite-treated reads and readDepth does higher level structural variation detection.
We also have pipelines for calling methylation levels from bisulfite-treated reads so that we have all three characterizations in a single shot. We get the epigenome, the methylome specifically, we get base pair level and indel variation using PASH and we get structural variations using readDepth.