Skip to main content
Premium Trial:

Request an Annual Quote

Online Games from Scripps Team Seek to Improve Gene-Disease Links, Phenotype Predictions


Three online games developed by a research team from the Scripps Research Institute aim to apply crowdsourcing approaches to improve current knowledge about gene-disease links and combinations of genes that are associated with particular phenotypes.

The games — Combo, Dizeez, and GenESP — were the subjects of two posters at this year’s Intelligent Systems for Molecular Biology conference held in Long Beach, Calif., July 15-17.

Combo is meant to address issues associated with building classifiers for high-dimensional datasets, which can be used to find consistent gene patterns in the data that could serve as predictors for particular phenotypes, Benjamin Good, a research associate in the institute’s department of molecular and experimental medicine and one of the developers, explained to BioInform at the conference.

The game does this by building decision trees using sets of five genes, which, when combined, could provide predictive patterns for phenotypes, he said.

So far, Combo includes datasets for two phenotypes: breast cancer metastasis, in which players can identify gene expression signatures in tumors that indicate distance to metastasis; and the developmental disorder craniosyntosis, in which players identify genes that are dysregulated in the developing calvaria of people with skull malformations.

In the game, players select a "hand" of five genes from a pool of genes that are presented in a board. As each player selects a gene, a score is determined by using the selected genes to train machine-learning algorithms to classify real biological samples. The better the genes reflect the phenotype in question, the better the score.

The round ends when each player has selected five genes. The scores and the decision trees with the expected outcomes based on the genes each player selected are then displayed.

Additionally, each gene is marked with an icon that provides additional information such as its function and the biological processes in which it participates.

“The hope is that we can do this over many subsets of the data with many different experts and we can use the game to capture the knowledge that isn’t in these databases to improve the results of classifiers,” Good said.

He said that the developers are considering entering a classifier generated by Combo into the Critical Assessment of Genome Interpretation contest, which assesses computational methods for predicting the phenotypic impacts of genomic variation (BI 11/12/2012).

“To get to that point, I think we have a lot of work to do in terms of making it more engaging [and] fun. That’s where the immediate next steps are,” he said.

The Combo developers are actively looking for additional datasets to add to the game.

Dizeez and GenESP, meantime, are aimed at collecting gene-disease associations that may exist in text form in published biological and biomedical literature but aren’t currently available in structured databases where computational biologists can use them, Salvatore Loguercio, also a research associate in the institute’s department of molecular and experimental medicine and one of the developers of the games, explained to BioInform.

Dizeez is structured like a quiz and gamers can select a particular disease area or protein family or answer questions from both categories. Players are shown one gene and five diseases and are expected to pick the correct disease that is linked to the gene. They have a total of one minute to select as many correct pairs as possible and are awarded marks for each correct match and penalized for each wrong one.

Matches marked as incorrect by the game could be novel gene-disease relations that researchers in the community are aware of but haven’t included in relevant data repositories, Loguericio said.

In a blog post describing the game, the developers explain that “when we analyze the game logs in aggregate, we expect that players’ answers will generally reinforce what’s already known. But given enough game player data, we also expect that we’ll see multiple instances of gene-disease links that aren’t reflected in current annotation databases. And these are candidate novel annotations.”

In fact, according to preliminary data in the ISMB poster, 4,585 unique gene-disease associations have been generated so far from 713 Dizeez games, and 224 gene-disease links that had been provided more than once during play were not found in either the Online Mendelian Inheritance in Man or PharmGKB databases.

GeneESP also attempts to find new links between genes and diseases although it takes a slightly different approach.

Loguericio explained that for this game, players are paired with partners and both are shown a disease name. Players then have to independently enter genes that are associated with the disease and are given points for those genes they have in common.

So far, the developers don’t have enough data from GeneESP games to suggest possible new gene-disease links, Loguericio said.

Other online games that leverage community expertise to address biological challenges include University of Washington’s Foldit, an online protein-folding game; and Phylo, a game developed by a team at McGill University in which players are expected to find the best possible alignment for sequences that are represented as rows of colored squares (BI 12/3/2010).

Recently, members of Foldit's online gaming community solved the structure of a protein-cutting enzyme from an AIDS-like virus whose configuration had eluded researchers for more than a decade (BI 9/23/2011).

The Scan

Lung Cancer Response to Checkpoint Inhibitors Reflected in Circulating Tumor DNA

In non-small cell lung cancer patients, researchers find in JCO Precision Oncology that survival benefits after immune checkpoint blockade coincide with a dip in ctDNA levels.

Study Reviews Family, Provider Responses to Rapid Whole-Genome Sequencing Follow-up

Investigators identified in the European Journal of Human Genetics variable follow-up practices after rapid whole-genome sequencing.

BMI-Related Variants Show Age-Related Stability in UK Biobank Participants

Researchers followed body mass index variant stability with genomic structural equation modeling and genome-wide association studies of 40- to 72-year olds in PLOS Genetics.

Genome Sequences Reveal Range Mutations in Induced Pluripotent Stem Cells

Researchers in Nature Genetics detect somatic mutation variation across iPSCs generated from blood or skin fibroblast cell sources, along with selection for BCOR gene mutations.