Researchers from Washington University School of Medicine in St Louis and other institutions have developed a new method for calling de novo insertion, deletion, and point mutations in sequence data from familial and somatic tissue samples more accurately than existing software such as SAMTools and the Genome Analysis Toolkit.
The software, called DeNovoGear, takes a collaborative approach to sample analysis and indel calling and uses "likelihood-based error modeling" to reduce the number of false positive calls, according to a paper published in Nature Methods. It also includes a fragment-based phasing algorithm, which is used to trace the origins of germline mutations.
More explicitly, the paper explains that unlike other methods which call genotypes by analyzing samples one at a time and identify de novo variants as incompatible calls between samples resulting in a high number of false positives, DeNovoGear jointly analyzes samples in a model-based framework that's based on beta-binomial distribution. It calls mutations using information about "individual genotype likelihoods, transmission probabilities, and priors on the probability of observing a polymorphism or de novo mutation at any given site in the genome."
The paper also provides results of tests done using whole exome and whole exome datasets that suggest that their approach does result in more accurate variant calls. In one scenario, indels predicted by DeNovoGear from a whole exome sequence data set had a 95 percent validation rate. In other tests using whole genome data, it called more validated de novo single base substitutions than three other tools — GATK, SAMTools, and PolyMutt.
DeNovoGear is one of the software fruits of the 1000 Genomes project. Donald Conrad, an assistant professor in WUSTL's genetics department and a co-author on the Nature paper, told BioInform that the earliest version was developed in 2008 to analyze data for an early pilot project that the consortium launched while trying to decide whether to sequence unrelated individuals or family trios — his team was tapped to explore the latter.
The researchers didn't immediately publish the approach after its creation because of competing priorities. More recently, however, the team decided that it was "really important to get a manuscript out specifically describing the method so other people could have access to it [and because] we think it's really broadly useful," Conrad said.
For example, it's well-suited for applications in areas such as pediatric genetic disease and cancer research studies, according to Reed Cartwright, an assistant professor of genomics, evolution, and bioinformatics at Arizona State University's Biodesign Institute and a co-author on the Nature paper. "A child with an unusual genetic disease may undergo genomic sequencing to see if the mutations observed have been acquired from the parents or are instead unique to the child," he said. "We can identify these mutations and try to detect which gene may be broken."
Conrad said the software is already being used in at least one study in the UK being led by Matthew Hurles a co-author on the DeNovoGear paper and a researcher in the Wellcome Trust Sanger Institute's Genome Mutation and Genetic Disease group. The project is called the Deciphering Developmental Disorders study and it's trying to identify mutations in children with genetic disorders of unknown origin. It's using DeNovoGear, Conrad said, to analyze whole genome or exome sequencing data from study participants and their parents.
Meanwhile, DeNovoGear's developers have begun working on new features for the next release of the software. According to the Nature Methods paper, they are trying to improve its calling performance by implementing new genotype likelihood models for calling new mutation types, such as variable number of tandem repeat mutations and copy-number variations.
They're also implementing methods for calling "frequencies (for example, in mosaic situations), and sample preparations (for example, single cells)," the paper states. Finally, they're working to extend "the inheritance model to cover arbitrary pedigree structures."