Part two in a two-part series.
During IBC's Drug Discovery Technology conference in Boston two weeks ago, BioInform sat down with several vendors selling informatics products with applications in in silico ADMET (absorption, distribution, metabolism, excretion, toxicity). In the first half of the discussion, which BioInform ran last week [BioInform 08-15-05], representatives from three firms Bio-Rad, Fujitsu, and Ingenuity Systems spoke about the current state of the art, adoption trends within the pharmaceutical sector, and factors driving use of these methods within pharma. This week, they discuss the quality of current training data, the impact of biological data on a historically chemical field, and the non-technical adoption hurdles for in silico ADMET.
Ian Welsford: manager of application science, biosciences group, Fujitsu.
Megan Laurance: product research scientist, Ingenuity Systems.
Gregory Banik: general manager, informatics division, Bio-Rad.
Greg, you mentioned training data. Is this information currently being captured effectively within pharmaceutical companies so that it can feed these [predictive ADMET] models and help build better training sets? Secondly, what is the status of information in the public domain in the ADMET area? Is that something that would help accelerate development of better methods?
Banik: There's an old saying that content is king, and I think that certainly holds out for model building. Also, the 'garbage in, garbage out' philosophy holds as well. If you've got data that has been curated from various literature sources, if they're done by different people on different days in different labs on different continents, the consistency between the data let alone just the regular experimental error that you get within any experiment the variability between labs can make the data set useless. And I think that that is one of the biggest issues with regard to building high-quality predictive models access to quality data. There's no consistency, necessarily, in data that's been curated from the literature. Even in a company, there could be different protocols for different experiments that then change over time, rendering the comparability of the entire data set entirely useless. That said, we do what we can with what we have, and as much as possible screen out data that are clearly outliers from either a training [set] or a validation set if it's clear that it's not consistent, and hope for the best.
Pharmaceutical companies are sitting on mountains of data that, for obvious reasons, they're not apt to let out in the world. We'd love to get our hands on that data to be able to model it, and see what could happen. I didn't attend the session [on predictive ADMET at DDT], but one of my colleagues did, and he said that there was a lot of talk about building local models rather than trying to rely on one-size-fits-all global models, and there's probably some benefit in ultimately using a combination of global and local models. As you get more data, further on in the process, you can build better models, or models, period. Early on, you don't have much data, and there's not much you can do but rely on global models and the data that is available at the time.
Laurance: We are hearing from a lot of our customers that inherent variability from patient to patient, sample to sample, makes it really difficult to anchor in on a particular set of markers. So one way that we've been able to help out is to map that to some higher-level or global concepts. So instead of hitting three genes, [to say] that we're hitting a pathway, that we're hitting a larger function such that if one of the key players in that function doesn't happen to be up in a particular patient, but that collection of genes really significantly hits that pathway, you're still going to be able to anchor on some biology that's relevant to that treatment and that compound. So that's one way that they're working around a lot of the variability in the data.
Welsford: The microarray community, with the MIAME standards, did a really good job with standardization and harmonization of interpretation and analysis, but I'm not aware of any such global standards that are related to any of the factors associated with ADMET with any of them, with any of the letters in the acronym. There are a lot of expert systems and expert users, and the data is housed in silos across the organization.
Banik: There was an ACS meeting this spring, and there was an entire session on whether or not there can be safe data exchange in pharmaceutical companies whether it's possible to strip away the chemical and provide enough information about the descriptors so that that information can be shared. That could move the industry away from that silo mentality, but even then, the variability between labs is an issue, and the lack of standardization that you mentioned could render it very difficult to really get a lot of value out of the data.
Welsford: There are a small number of players in the space that are going to see an integrated pipeline as a significant competitive advantage. But for others, they view that the processes they developed internally to prioritize ADMET modeling and prediction into their pipelines gives them their competitive advantage. At the end of the day, to expect them to give that up and share it with the rest of the world is probably naïve.
"I'm not aware of any global standards that are related to any of the factors associated with ADMET with any of them, with any of the letters in the acronym."
I'm curious what you're seeing from customers in terms of bringing biological data and chemical data together. You're all coming at this from different sides, and probably working with different sets of customers in the same organizations, so I'm curious what the overlap is right now, and how you see that changing. Are expert users in the QSAR area looking to bring in more biological knowledge, for example?
Welsford: We see a lot of project teams involved in the analysis, and they're usually focused around some target. So there will be a project manager, and he'll have representatives from bioinformatics, cheminformatics, some biologists, some safety people, and a whole bunch of chemists. In the discussions we've had, we see that there is still a very high degree of intransigence on the part of certain medicinal chemists. They know their chemistries, they will still advocate very strongly for certain scaffolds, and so I think there is still some indication of internal tension within the organization, preventing them from getting at some of this data, and if we can get that to change we'll see a real shift forward.
Laurance: My perspective is more from the biologists' side. I have to echo what you said this is all cross-functional project teams at this point. To me it's always been an interesting problem as a scientist. In my brief stint in drug discovery, the chemists were right down the hall, we were all working on the same stuff, and the language barrier not having that common biological language of the impact on the cell, the impact on the phenotype or disease made it really difficult to all move forward in the same direction.
I think a lot of what we're talking about is various methods of coming up with a way to, as those projects meet, to talk about the same thing. What are the clinical end points, how do you measure those, what can we all look at together to talk about the project? But again, my perspective is much more from the biologist. The main entry point for us is very gene-focused, protein-focused, but we're hoping to fix that at some point and make it a lot easier for the chemists to come in and work from the drug backward.
Banik: Our focus is on the small-molecule side, but again, it's the same experience with cross-functional teams chemists, medicinal chemists, computational chemists they're all coming together to try to address this issue that impacts them all.
Laurance: The way it's always been described to me is that it's very much an over-the-wall mentality. This is what happened, and then we'll toss it over to you guys. And that's all changing, but the dynamics of that are still being worked out.
"I'm still amazed at some structures inside pharmaceutical companies where they have a whole separate biomarker approach for drug efficacy, and they have nothing to do with what they're looking at in safety."
Banik: My first job out of graduate school was in the cheminformatics group at Abbott labs, and at that point, it was very much an insulated team. And by necessity, the reality is a much more complex, interconnected mathematical model.
Laurance: I'm still amazed at some structures inside pharmaceutical companies where they have a whole separate biomarker approach for drug efficacy, and they have nothing to do with what they're looking at in safety. But they're all using the same approaches and the same methods for finding measurable ways of capturing what's going on in a particular model or disease state. It's just going to take a while for that to change.
From a vendor perspective, what do you consider to be the key challenges right now in this field? What are the barriers to really getting these methods adopted, and how are you addressing those challenges?
Laurance: From my perspective, it's working with clients as much as possible to make sure they know how to take advantage of the technology that we have, how they can leverage what we do have in the knowledgebase, integrating their own knowledge and discoveries to actually build working models of what they think is going on with certain molecules that they have under consideration.
The second thing we're focusing on is making sure that we give them an avenue in from the small-molecule approach. As I said, right now it's very focused on asking questions about sets of genes and biomarkers. We need to let people go in from the small-molecule effects, backwards. So that's just making sure that we open up the workflow so that there really is a common language and a common workflow.
Banik: Our approach is a small-molecule approach, and of course one of the things that we need to worry about is integrating into the large-molecule side of things. But that's clearly what this is all about. The pharmaceutical industry is all about the interaction between large and small molecules, but in terms of widespread adoption of these in silico technologies for small molecules, domain applicability and model confidence are some of the big issues that we see. Surrounding that is the issue of validation the ability to validate that a model built by someone is applicable to the scaffold in hand. It's very easy to validate a model, but then to use it in a more high-throughput fashion and with confidence is the biggest challenge that we see to widespread adoption.
"The pharmaceutical industry is all about the interaction between large and small molecules."
To address that, we've added the ability to display model confidence, and to utilize that where the model is capable of generating [that information], but not all models do that, and that's a problem. We try to address the confidence issue by creating consensus models whether they're models from our partners, whether they're a combination of global models and local models that a company has prepared from their own data that they've been able to acquire and that consensus ability has been a very, very strong and well-received feature.
But one of the biggest issues is really going to be the data, and that is something that impacts us all. And unfortunately, it's something that we're not about to solve any time soon, just given the nature of the industry.
Welsford: So far, we're seeing all those barriers. I think there's also intellectual bottlenecks. You have certain aspects of the process that are limited in terms of understanding things of a biological nature, limited by perceived barriers, limited by FTE availability. So what Fujitsu has attempted to do is to provide a set of tools that we can get in the hands of as many people as possible. We see that almost nobody can sit in a comfort zone in drug discovery. Gone are the days when you only had to work on a single scaffold. Now research scientists are working on three different project teams, and you see that now chemists asking, 'How come the hERG signal is something I have to worry about, when I slept through that physiology lecture?' People are being pushed beyond their comfort level, so there's a need to have both CPU power and tools in the hands of the users.
Having said that, there really isn't any one unified workspace or informatics flow, there's no standard, there's no one method, et cetera, that's providing an enterprise-wide method of sharing data.
Laurance: We're very focused on building tools that end-user biologists can use. So there [are] no huge hurdles to getting the data in, [no] hurdles for interpreting it. But, that said, there are questions that someone in tox is going to ask that not all the rest of our clients are going to ask, and if you can figure out what those questions are and give them the content to support it, and the workflow to support it they're not coming in with large sets of genes, they're just coming in with a question about hepatotoxicity for a compound. So for us, it's figure out what the hurdles are, what the major blockers are, and somehow translate that into a natural workflow and questions that a biologist can answer. It's hard to keep your eye on that end user, and as we get more people in the clinic that are starting to use these types of tools to decipher what's going on in certain diagnostic tests, it will be a whole other challenge for us.