Skip to main content
Premium Trial:

Request an Annual Quote

AstraZeneca s Hugh Salter on Making Toxicogenomics Informatics Work


As the manager of bioinformatics at AstraZeneca’s department of molecular sciences in Huddinge, Sweden, Hugh Salter has evaluated a number of available methods for analyzing toxicogenomics data from microarray experiments. In a paper he co-authored earlier this year in Current Opinion in Drug Discovery & Development, Salter argued that the field of toxicogenomics informatics is still in its infancy. While the analysis methods are similar to those used for other DNA microarray experiments, Salter noted that toxicogenomics presents a number of unique informatics challenges that remain unsolved. Drug response is likely to be a more dynamic process than expression changes in different tumor types, for example, and may therefore be more difficult to model and predict. In addition, he wrote, “toxicogenomics (arguably) aims to provide overall prediction, rather than modeling a set of response subtypes, and how generally feasible this is remains to be seen.”

A particular limitation for the field, according to Salter, is the lack of public toxicogenomics datasets. Unlike other research areas, like yeast and tumor biology, where gene expression data is often made publicly available, toxicogenomics experiments tend to be performed within the context of commercial drug development, and the pharma and biotech companies conducting those experiments are loath to part with their data. “As little public and technologically comparable results are available,” Salter wrote, “there is no consensus as to which of the many possible probe selection, data reduction, and non-supervised clustering methods is best for toxicogenomics.” In a recent interview with BioInform, Salter expanded on the points he raised in his paper, assessed the current state of the art, and discussed what it will take to make predictive toxicology a reality.

What are the main differences in the informatics necessary to analyze large sets of gene expression data for target discovery rather than large sets of gene expression data for toxicogenomics?

It depends to some extent on what it is you’re trying to get out of the system. For toxicogenomics, you may want specific genes instead of building a predictive system. In target discovery, there are situations where you want to build a predictive system.

So, for example, you could have an in vivo pharmacology model and want to predict an efficacy readout rather than a toxicology readout. There’s also the mode where you could be doing a toxicology (or pharmacological) study but you want to get out the explanation for why those genes in particular responded. So this brings in the distinction of two modes of doing the analysis. One is a mechanistic exploration of the data, and a typical result of that would be to identify potential candidate targets.

There is also the predictive modeling situation, where the actual identities of the genes that describe the model are not necessarily of interest. The two are closely related because I think it’s difficult to get a complete buy-in from toxicologists for these systems unless you can, as well as knowing that certain genes are predicted to be involved, also offer a rational explanation for why they’re predicted.

When you speak of predictive toxicology, then, is that purely computational or does it still involve a large degree of wetlab experimentation?

There are different ways you might want to do predictive toxicology. There’s the early stage where it allows you a sort of surrogate measure: In the same way that you might use cell permeability as an estimation for eventually what will happen to bioavailability, in theory you can use cellular systems as surrogates for what will eventually happen to toxicity. We’re not anywhere close to being able to understand exactly what’s going on, but that’s the eventual aim.

How far off do you think something like that may be?

It depends who you believe. I think you can make the measurements. Time will tell as to whether or not they are accurate. So you can spend a certain amount of time being retrospective, which we are, and your predictions can be quite accurate. But they’re not the same compounds that you’re going to be testing and ranking in the future, and there is a lot of chemical space.

What kind of criteria are you using to prove the effectiveness of the techniques that you have on hand right now?

Well, an obvious system is that once you’ve identified a set of predictive markers, then you can do more accurate measures than microarrays to understand what the basis is for them being predictive markers. So you can bring every other technique that you can think of to bear on the problem; not just transcript analysis but also proteomics and so on.

But [your] question could also be interpreted to say, ‘Do the technologies work?’ Well, yes they do, but there are problems with microarray technologies in terms of the comparability between the different platforms, and also the problems inherent in combining data that’s been created at different sites, which is not by any means solved and in some cases may not actually be a solvable problem because there’s a lot of noise in these systems.

So I think there’s still some scope for new systems to emerge, but that doesn’t mean that the genes and models that we build now are wrong. They may well prove to be right, but there is still scope for development.

I didn’t realize before reading your paper that toxicogenomic data is not generally publicly available. How much of a problem is that? Are you limited to what you can generate in house?

No, because there are data suppliers. We subscribe to GeneLogic’s ToxExpress, which gives access to a large chunk to work on. More to the point, because it’s uniformly generated at one site, it kind of has an internal comparability. There’s also the ILSI [International Life Sciences Institute] initiative in the public domain and the NIEHS in the US. They’re hopefully going to give a data corpus that will speak to comparability between sites and comparability between different approaches and between different platforms, and that’s going to be a very useful innovation as we move forward.

What role do you see for the regulatory agencies such as the FDA in enforcing standards for this data?

There is a role in making data comparable, but that is an issue about platforms and technology, and not just about data interchange standards.

I think it would be quite premature for the FDA or anyone else to use these as uniquely predictive tools. Their promise is probably more in the relative behavior of different sets of compounds and the ability to rank and predict what we believe to be toxicity at an early stage rather than making very hard and fast judgments. It may be that hard and fast judgments are possible, but at the moment that would seem to be some way off, particularly given the difficulty between the platforms and comparability issues and so on.

So one person’s ‘yes’ may be someone else’s ‘no.’ Clearly, that’s not the case with other types of assays that you make regulatory judgments on, so there’s a way to go.

Several companies are working on structure-based ADME and toxicology prediction methods. Is that something you see promise in?

There will be similar caveats about chemical space coverage, but there may be more data.

What kind of impact is the release of the rat genome sequence having on the field of toxicogenomics?

The immediate prospect that the rat data genome gives is the chance to unambiguously describe what the structural genes are and how they differ between rat and human. We have a pretty good picture, but it’s not complete at the moment. Working from that, we’ll be able to design chips that are actually completely representative of the rat genome. Commercial-grade chips for the rat genome lie somewhere in the future — maybe months, maybe a bit longer than that — but once those are there, that offers an opportunity to use those chips either with the models we have now or to redevelop them to build what will be more robust, more accurate models.

Certainly there are problems with any kind of chip design in terms of multiple splicing events, for example, leading to confusion as to exactly what transcripts it is you’re measuring. The opportunity of getting around those will be fantastic. And the other impact of the rat genome will be similar to anything you can say about humans. For example, you will be able to look at genomics measurements in rats and interpret them in terms of, for example, the promoter regions in the genes, and not just the genes themselves. So it opens up a lot of possibilities.

What kind of advances would you like to see in the field of toxicogenomics over the next year or several years?

We need more data at the moment. There’s a lot of data, but we need more data in the public domain so we can get a lot more input as to what is comparable and what isn’t comparable between platforms and approaches.

Filed under

The Scan

Germline-Targeting HIV Vaccine Shows Promise in Phase I Trial

A National Institutes of Health-led team reports in Science that a broadly neutralizing antibody HIV vaccine induced bnAb precursors in 97 percent of those given the vaccine.

Study Uncovers Genetic Mutation in Childhood Glaucoma

A study in the Journal of Clinical Investigation ties a heterozygous missense variant in thrombospondin 1 to childhood glaucoma.

Gene Co-Expression Database for Humans, Model Organisms Gets Update

GeneFriends has been updated to include gene and transcript co-expression networks based on RNA-seq data from 46,475 human and 34,322 mouse samples, a new paper in Nucleic Acids Research says.

New Study Investigates Genomics of Fanconi Anemia Repair Pathway in Cancer

A Rockefeller University team reports in Nature that FA repair deficiency leads to structural variants that can contribute to genomic instability.