Recently, researchers from the UK's Institute of Cancer Research released a freely available database of experimental data collected from cancer studies that they believe will help scientists identify and prioritize genetic targets for cancer therapies faster and more efficiently.
According to its developers, the so-called CanSAR database, which was developed with funds from Cancer Research UK, combines information from patients; clinical trials; data from genetic, biochemical, and pharmacological repositories; and more in a single database. It uses artificial intelligence-based methods "to draw paths of knowledge" between these data points enabling users to "predict risks and opportunities and make drug-relevant suggestions that can be tested in the lab and take us closer to a drug," Bissan Al-Lazikani, team leader of ICR's computational biology and chemogenomics group and lead researcher for the CanSAR project, said in a statement.
Her team, which is part of ICR's cancer therapeutics unit, began developing CanSAR roughly three or four years ago to provide more targeted methods of identifying candidate cancer drug targets for ICR's therapeutics division. "One of the needs that we had here as part of our drug discovery program is to be able to select good drug targets and to find out all the relevant information about them and to make rational decisions about how to progress a gene from just a hint that it's involved in cancer to an actual drug," she told BioInform.
While there is an abundance of information from a number of large-scale cancer genomics studies as well as other public projects exploring the three-dimensional structures of proteins, for example, attempting to access and use the data to identify potential treatment targets proved problematic, she said, because, as is often the case, the data is siloed in several disparate resources and across multiple disciplines.
CanSAR provides the "bridge," linking "raw gold mines of genetic data to a whole raft of independent chemistry, biology, patient data and disease information," and it makes it available in one place, Al-Lazikani said. It's also designed to provide "a common language" so that "different disciplines like the genomics and chemistry can actually start talking with each other and you can link them with each other."
With the information under one roof, it's possible to generate profiles of effective drug targets based on the features that genes used in successful drug development studies, and to then apply these profiles to new cancer genes as they are discovered — the idea being that a match between two profiles could indicate that the new gene might be worth further study as a potential drug target.
The depth of information in CanSAR makes it possible to combine multiple parameters such as information about protein 3D structures, interactions with other proteins, modes of communication within a cell, data about the chemical environment of the gene, and so on, Al-Lazikani said. Profiles of successful targets are used to train the artificial intelligence methods embedded in the database so that when it's presented with the profile of a new gene, it can simply match it up against what it knows successful gene targets look like, she explained.
CanSAR currently contains more than 8 million experimentally derived measurements, nearly 1 million biologically active chemical compounds, and data from over a thousand cancer cell lines, as well as drug target information from the human genome and model organisms. Since it launched, it has been used in a number of projects, including one study published earlier this year by Al-Lazikani and colleagues at the ICR.
According to the abstract of the paper, which was published in Nature Reviews Drug Discovery in January, the researchers used CanSAR to "evaluate an exemplar set of 479 cancer-associated genes." AL-Lazikani told BioInform that the researchers identified 46 genes from the set with druggable potential that haven't yet been studied. The "average researcher [probably] would not have had enough information on their fingertips to be able to distinguish them from the rest of the genes," she said. That’s where CanSAR's value comes through. "It provides that information."