Skip to main content
Premium Trial:

Request an Annual Quote

Univ. of Washington Duo Find Disparate Results for Four Sequence Alignment Tools

By a GenomeWeb staff reporter

NEW YORK (GenomeWeb News) – There is a lack of agreement between the results generated by different alignment tools that compare and contrast multiple genome sequences, according to a recent paper in the early, online edition of Nature Biotechnology.

Martin Tompa, a computer science and engineering researcher at the University of Washington, who is also affiliated with the institute's department of genome sciences, and doctoral student Xiaoyu Chen compared four sequence alignment tools using 554 million bases of sequence data from 28 vertebrate genomes. Rather than producing consistent results, the pair found a lack of conformity between alignments generated from the same vertebrate dataset — particularly when comparing species that are not closely related or looking at non-protein coding parts of the genome.

"We discovered that there's a disturbingly low level of agreement between genome alignments produced by different tools," Tompa said in a statement. "What this should suggest to biologists is that they should be very cautious about trusting these alignments in their entirety."

The data used for the new comparison stemmed from research done by groups working within the ENCODE consortium's Multi-Species Sequence Analysis team. That study, which appeared in Genome Research in 2007, involved aligning one percent of the human genome with genome sequences for 27 other vertebrates.

For the current paper, Tompa and Chen delved into the details of this 554 million base pair alignment, looking at the agreement, coverage, and accuracy of the sequence alignment tools used in the ENCODE study: Threaded Blockset Aligner (TBA), Multiple Limited Area Global Alignment of Nucleotides (MLAGAN), Mavid, and Pecan.

"What makes these alignments an unprecedented test bed for comparisons is that four expert teams used four different methods to align the same 28 vertebrates sequences," Chen and Tompa wrote.

Unlike the initial analyses, though, the pair assessed all of the aligned vertebrate sequence data rather than honing in on mammalian data.

Unexpectedly, the researchers found a low agreement between the alignments, especially for untranslated regions, introns, and intergenic sequences.

In general, they found lower agreement, coverage, and accuracy with increasing species distance from humans, though agreement was low even when comparing alignments of human and mouse sequences.

"Such low levels of agreement indicate that constructing a reliable whole-genome multiple sequence alignment remains a significant challenge," the duo noted, "particularly for non-coding regions and distantly related species."

On the whole, the pair's analyses using the statistical analysis method StatSigMA-w suggest the European Bioinformatics Institute tool Pecan provided the most accurate results of the four methods tested.

Based on the alignment differences and accuracy deficits detected in the new paper, Chen and Tompa argued that researchers need to take a critical look at sequence alignment tools and should be particularly vigilant about double checking alignments involving sequences from distantly related species and/or non-coding regions of the genome. In the long term, they say, evaluating alignments in this fashion may help to improve the alignment approaches.

"I think we're all interested in having a better understanding of which methods work the best and how to make them better," Tompa said in a statement.

The Scan

Genetic Risk Factors for Hypertension Can Help Identify Those at Risk for Cardiovascular Disease

Genetically predicted high blood pressure risk is also associated with increased cardiovascular disease risk, a new JAMA Cardiology study says.

Circulating Tumor DNA Linked to Post-Treatment Relapse in Breast Cancer

Post-treatment detection of circulating tumor DNA may identify breast cancer patients who are more likely to relapse, a new JCO Precision Oncology study finds.

Genetics Influence Level of Depression Tied to Trauma Exposure, Study Finds

Researchers examine the interplay of trauma, genetics, and major depressive disorder in JAMA Psychiatry.

UCLA Team Reports Cost-Effective Liquid Biopsy Approach for Cancer Detection

The researchers report in Nature Communications that their liquid biopsy approach has high specificity in detecting all- and early-stage cancers.