Skip to main content
Premium Trial:

Request an Annual Quote

Symbolic Scatter Plot Helps Visualize Patterns Within DNA Sequence


There's more than one way to look at a DNA sequence, and it doesn't have to involve an endless string of letters. North Carolina State University's David Cox has created what he's termed a "symbolic scatter plot" that people can use to visualize patterns found within a DNA sequence with the naked eye. One of his main findings was that when compared to Boston University's Tandem Repeats Finder, his technique found more interesting patterns, and in some cases, repeats that TRF missed. Cox will present his research this summer at the 2009 International Conference on Bioinformatics and Computational Biology in Las Vegas.

Tandem repeats are significant in the molecular pathology of many diseases, including Huntington's disease, and Cox's technique may help researchers identify the small causative changes in DNA patterns more effectively than computational sequence analysis tools like TRF or Blast. "My thesis is that even though we have good software for finding those patterns, it's still unmatched when compared to the human visual system," he says.

Cox, who is a graduate student working on his PhD in computer science, says he developed the scatter plot with a visualization tool in mind. "Most of the bioinformatics algorithms take a statistical approach and analyze DNA sequences statistically, looking for matches that are, from a statistical perspective, not random," he says. "What I wanted to do was to actually be able to see the matches in some fashion, and in looking at the techniques that were currently available, I didn't find any that were particularly good at it."

His technique starts out similar to Blast, he says, in that it takes the sequence at hand and breaks it up into small words. Whereas Blast computationally plugs those words into a database to find similar matches, his method simply maps the words. In his case those words are 3-mers that correspond to one of 64 possible choices because there are 64 possible combinations of three nucleotides. Each 3-mer is represented as a point on the scatter plot, zero through 63, with that number serving as the y-coordinate. The x-axis is the order that the 3-mer appears in the genetic sequence. Cox designed the symbolic scatter plot so that those 3-mers that correspond to the same amino acid are adjacent to each another.

What initially struck him when he first did it on various plots of the human genome were the variable and interesting patterns he saw. "What I'm doing now is to look at those patterns and try to understand a little bit as to what they mean," Cox says. "Are they important biologically and, if so, what do they mean?"

He says that the scatter plot is basically a research tool, and won't replace currently available software simply because there is a limit to visualization. "For example, if you're comparing two sequences from two distantly related organisms, then the nucleotides might not match at all and in that case, you have to rely on some sort of assumptions about how frequently the nucleotides mutate and how frequently insertions and deletions occur in order to come to some conclusions," Cox says. "Visualization is not going to help that."

The Scan

US Booster Eligibility Decision

The US CDC director recommends that people at high risk of developing COVID-19 due to their jobs also be eligible for COVID-19 boosters, in addition to those 65 years old and older or with underlying medical conditions.

Arizona Bill Before Judge

The Arizona Daily Star reports that a judge weighing whether a new Arizona law restricting abortion due to genetic conditions is a ban or a restriction.

Additional Genes

Wales is rolling out new genetic testing service for cancer patients, according to BBC News.

Science Papers Examine State of Human Genomic Research, Single-Cell Protein Quantification

In Science this week: a number of editorials and policy reports discuss advances in human genomic research, and more.