Scientists from the University of Utah and colleagues at privately held Omicia have developed a software package that requires a relatively small sample size to functionally interpret whole-genome sequence data which they claim is more accurate than existing methods.
The partners described the Variant Annotation, Analysis, and Selection Tool, or VAAST, in a paper published last week in Genome Research.
Omicia CEO Martin Reese told BioInform via e-mail that the partners developed VAAST to address a "growing need for reliable tools that enable the use of personal genome information for medical diagnosis and preventive care."
Omicia holds sole commercial rights to VAAST and is implementing it into the company's Genome Analysis System, a platform it is developing to clinically analyze human genomes. It plans to launch the tool in the third quarter as a software-as-a-service model.
Reese said the tool is currently in beta testing with "a number of well-known commercial entities, research organizations, and individuals." He told BioInform's sister publication Clinical Sequencing News that the platform will be used to clinically annotate both whole genomes and more targeted data such as exomes or gene panels (CSN 06/29/11).
VAAST's secret sauce, according to Reese, is that it is able to quickly mine genomic data, prioritize and identify disease-causing genes and their variants, and determine the statistical significance of any hits.
These improvements arise from the developers' strategy of combining two established methods for variant prioritization: the amino acid substitution, or AAS, approach and an aggregative approach that "collapses" rare variants into a single group for analysis.
"VAAST combines variant frequency information and AAS information into a single unified likelihood framework," Reese said. "VAAST can also use pedigree and phase information and places a P-value on hits like Blast ... [and it] is also capable of using rare variants to identify the causes of rare diseases and both rare and common variants to identify the causes of common diseases.
"No other tool has these functionalities," he added.
Meeting the Shifting Needs of Genome Analysis
In the Genome Research paper, the researchers note that due to the rapid rise of next-generation sequencing, "the human genome is no longer a frontier." As a result, evaluating sequence variants "in the context of pre-existing gene annotations" is gaining in importance, but tools that were developed to analyze data from SNP arrays and genome-wide association studies fall short.
It is not "merely a matter of annotating nonsynonymous variants ... [or] predicting the severity of individual variants in isolation," the authors continued. "Rather, the challenge is to determine their aggregative impact on a gene’s function."
While there are tools available for variant prioritization, such as ANNOVAR (Annotate Variation), and SIFT (Sorting Intolerant from Tolerant), these methods often require users to specify search criteria, which “places hard-to-quantify limitations on their performance,” the authors wrote. Likewise, aggregative approaches such as CAST (Cohort Allelic Sums Test) and CMC (Combined Multivariate Collapsing) "have remained largely theoretical."
Unlike analysis methods developed for GWAS, "which evaluate the statistical significance of frequency differences for individual variants in cases vs. controls, VAAST evaluates the likelihood of observing the aggregate genotype of a feature given a background dataset of control genomes," the authors note, adding that this approach "greatly improves statistical power, in part because it bypasses the need for large statistical corrections for multiple tests."
VAAST's ability to incorporate AAS information, meantime, sets it apart from variant aggregation approaches such as CMC and Kernel Based Adaptive Cluster methods, both of which were developed by researchers at Baylor university and recently incorporated into the sequence analysis module of Golden Helix' next-generation sequence analysis software (BI 05/20/2011).
Moreover, the tool incorporates AAS data in a manner that "allows it to score more SNVs than existing AAS methods such as SIFT," the researchers wrote, in fact when VAAST was run without any AAS information, its accuracy fell from 95 percent to 80 percent.
VAAST scores variants by combining "variant frequency data with AAS effect information on a feature-by-feature basis," where a feature is defined as "one or more user-defined region — [genes and conserved sequence regions for example] — of the genome," the researchers explain in the paper.
VAAST can be used to score coding and non-coding variants and evaluate their cumulative impact, identify rare variants linked to rare genetic diseases, as well as use both rare and common variants to identify genes associated with common diseases that current software tools fail to detect.
According to the paper, SIFT and ANNOVAR identified 57.5 percent and 71 percent of disease-causing variants, respectively, in a previously published dataset of 1,454 known disease-causing variants from the Online Mendelian Inheritance in Man database while VAAST identified 98 percent.
Furthermore, when the same analysis was performed using 1,454 non-synonymous variants from the 1000 Genomes Project that aren't linked to diseases, SIFT and ANNOVAR incorrectly identified 11.9 percent and 0.8 percent as deleterious, respectively, while VAAST identified 8.12 percent.
Although ANNOVAR outperformed VAAST in one of the tests, the researchers adjudged that VAAST has an overall accuracy of 94.9 percent while SIFT has 79.8 percent accuracy and ANNOVAR has 88.3 percent accuracy.
VAAST also addresses differences in genomes that arise from varying sequencing platforms, changes in depth of coverage, and variant calling methods, which are a source of false positives — primarily in low-complexity and repetitive regions of the genome.
Among its capabilities is a runtime option for masking variants — described in the paper as an option that excludes a list of nucleotide sites from the calculation based on information obtained prior to the analysis — that lets users specify a read length and paired and un-paired reads. It then identifies all the variants that meet these criteria and excludes them from its calculations.
The developers also discussed VAAST's ability to find mutations using a "modest" sample size. In one test involving data from individuals with Miller syndrome, the algorithm successfully identified both genes associated with the disease with a cohort of two related individuals.
A separate paper also published last week described VAAST's use in an international effort to identify mutations responsible for a newly discovered childhood disease, tentatively called Ogden syndrome, that is characterized by an aged appearance, craniofacial abnormalities, and cardiac arrhythmias among other symptoms.
In the study, published in American Journal of Human Genetics, the team used X-chromosome exon capture and next-generation sequencing methods and VAAST to identify a disease-causing mutation in the NAA10 gene associated with OS in children from two unrelated families.
Study author Gholson Lyon of the Children's Hospital of Philadelphia said in a statement that the study is one of the first times a personal genome analysis tool has identified a previously unknown syndrome.
Furthermore, he noted that VAAST identified the causative mutation using data from just two individuals — an affected boy in one family and a mother who was a carrier in an unrelated family.
The study is proof of principle that VAAST "can identify disease-causing mutations with greater accuracy, using DNA from far fewer individuals, more rapidly, than was previously possible," Lyon said, adding that his team is now using VAAST in research efforts involving rare Mendelian disorders and other common disorders such as ADHD and autism.
Omicia clearly sees a market opportunity for the use of VAAST in the clinical setting, but the company faces competition in this quickly evolving area. GenomeQuest recently launched a "clinical decision-support system," called GQ-DxSM, for whole-genome diagnostics that analyzes information about variations and changes in genes and proteins to improve disease treatment.
Likewise, Knome recently launched a new version of its kGAP genome interpretation engine, which serves as the foundation for the company's service model. Knome applies kGAP in projects with pharmaceutical and academic clinical researchers to identify gene variants involved in drug response, cancer, and other diseases.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.