Skip to main content
Premium Trial:

Request an Annual Quote

Having Fun While Contributing to Genetic Research is the Aim of New Sequence-Alignment Game


By Uduak Grace Thomas

A research team at McGill University has launched a web-based game that aims to do for multiple sequence alignment what the University of Washington's Foldit does for protein folding.

In the new game, called Phylo, players are expected to find the best possible alignment for sequences that are represented as rows of colored squares, with one square per nucleotide, in a grid that represents the genome. Gamers can move entire sequences or individual boxes to the right or to the left in an effort to match colors in one row to the corresponding box in the row below.

Phylo, which went online this week, contains 2,000 sequence alignment puzzles, drawn from the University of California, Santa Cruz's Genome Browser, that are associated with 150 diseases of the digestive, heart and circulatory, blood and immune, sensory, brain and nervous systems as well as cancer and metabolic disorders. Players can align sequences of up to eight species at a time in the game.

Jérôme Waldispühl, an assistant professor of computer science at McGill and one of the game's developers, told BioInform that by involving the public, he expects to improve on the quality of multiple sequence alignment because human beings have the ability to "recognize patterns and solve visual problems efficiently."

To illustrate, he explained that human beings typically look for patterns when solving a puzzle rather than trying to identify all possible conformations as traditional alignment algorithms do, making them too computationally expensive to run on problems involving large numbers of sequences.

"We want to take advantage of this capacity of humans," he said, as well as to encourage the general public to make meaningful contributions to genetic research while "having some fun."

While many multiple sequence alignment algorithms use heuristics to align sequences, Waldispühl and colleagues note on the Phylo website that even this approach is extremely computationally intensive for achieving an optimal alignment.

Since Phylo relies on alignments that have already been processed by a heuristic algorithm, "we allow the user to optimize where the algorithm may have failed," the developers noted on the website.

When selecting sequences for Phylo, Waldispühl said that the team focused specifically on promoter sequences, which control gene expression, because mutations in this area are often linked with diseases.

The sequences are stored in a MySQL database that connects to a client written in Flash, which transforms the sequences into the colored boxes representing the genetic code. The game is housed on a server that can host about 3,000 players at a time.

Gamers can play Phylo as guests but need to create accounts if they want to keep a record of their alignments.

Players are scored based on the number of successful color matches they make and are penalized for mismatches as well as for gaps in the sequences.

The scores are evaluated by comparing the aligned sequences to a common ancestral sequence and identifying regions where the sequences match or don't match and adding or subtracting points respectively.

When players are able to outperform the score obtained by the computer, they are considered on "par" and can move on to the next set of sequences. They can also align the same sequences multiple times to improve on their score.

As an added feature, players can choose to align sequences that have been linked to specific diseases.

The Phylo developers said it "will eventually be re-introduced back into the global alignment as an optimization."

Waldispühl said that the user-derived alignments will be "re-inserted into the original multiple sequence alignment" taken from UCSC's Genome browser.

"We are currently in touch with [UCSC] to send them back our results once the global improvement will be confirmed," he said.

He also noted that it may be possible to discover "some strategy to improve heuristic algorithms" but that it is too early to tell.

The team also plans to embed the game in Facebook, which Waldispühl said should occur in a few months. Applications for smart phones are also in the works. Furthermore, he said, while the game's description and directions can be read in both English and French, the developers hope to include more languages to appeal to a more global audience.

Waldispühl said that the team has already identified 500 additional alignment puzzles that it plans to include in the game, "depending on how fast it grows" as well as sequences that are associated with other disease types.

He also noted that while he expects the game to be educational for some players, even the "pure gamer" isn't wasting time because finding the optimal sequence alignment could provide a better understanding of disease mechanisms and lead to the development of better therapies.

According to Waldispühl, games like Foldit are fun, but "they still target some sort of specialized or technical audience," whereas with Phylo, "we wanted to abstract a problem to a game so that people can play" whether they are interested in the motivation behind it or not.

Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.

The Scan

Study Links Genetic Risk for ADHD With Alzheimer's Disease

A higher polygenic risk score for attention-deficit/hyperactivity disorder is also linked to cognitive decline and Alzheimer's disease, a new study in Molecular Psychiatry finds.

Study Offers Insights Into Role of Structural Variants in Cancer

A new study in Nature using cell lines shows that structural variants can enable oncogene activation.

Computer Model Uses Genetics, Health Data to Predict Mental Disorders

A new model in JAMA Psychiatry finds combining genetic and health record data can predict a mental disorder diagnosis before one is made clinically.

Study Tracks Off-Target Gene Edits Linked to Epigenetic Features

Using machine learning, researchers characterize in BMC Genomics the potential off-target effects of 19 computed or experimentally determined epigenetic features during CRISPR-Cas9 editing.