In early 2000, AstraZeneca Pharmaceuticals opened a new, 180,000 square-foot R&D center in Waltham, Mass. AstraZeneca R&D Boston houses the company’s North American Enabling Science and Technology (EST) center, which provides high-throughput technologies for the company’s researchers in several areas, including bioinformatics. Jim Fickett, who joined the company from GlaxoSmithKline last September as the global director of bioinformatics, recently spoke to BioInform about the center’s role in supporting pharmaceutical research.
How is AstraZeneca’s bioinformatics group distributed across the global organization?
It’s a little bit complex. I think every company solves the problem of the need for coordination versus the need for connection to many global groups a little bit differently. We have one central department that takes care of bioinformatics infrastructure within the company, and that’s called EST bioinformatics. In addition, there are bioinformatics groups within the research areas of the discovery organization. So there’s a cancer bioinformatics group, a respiratory bioinformatics group, an inflammation bioinformatics group, etc. And there’s an overall committee that I chair that coordinates the activity of all these groups.
How many AstraZeneca employees are engaged in bioinformatics work?
Bioinformatics, probably 50. I haven’t counted it up.
Does the EST group supply overall technical support to the individual research bioinformatics groups?
We build the tools and import the databases, and then the research bioinformatics groups apply them in the particular discovery projects. That’s the general model. There is, of course, some overlap.
To what extent do you find that you need to build your own tools rather than license them from a vendor?
We would certainly prefer to buy when possible. Our general policy is that we would rather buy than build. Unfortunately, the things that we need are very often not available. For example, we would very much like to have a gene catalog. I think every pharmaceutical and most of the biotechs would like to have genomics information organized around genes: have a gene index and look up everything known about the gene, get a quick idea of the function and the likelihood that there would be therapeutic value in studying that gene. It’s just not available, so in every pharma where I know people, they’re building their own.
So even efforts like the Gene Ontology are not comprehensive enough to provide what you’re looking for in a gene catalog?
The Gene Ontology effort is certainly a step in the right direction, but it’s more a classification of function than it is an index of genes with all the available information around it. Closer to what we would need is the PSD database that was put together by Proteome [now a division of Incyte] and the GeneCards effort that was started at the Weizmann Institute. Both of those are quite useful, but they don’t have a lot of the features that we would need.
What other tools or resources are lacking in the bioinformatics market right now?
Another big area where we would like to see more work, and this is quite widely recognized, is in getting information out of the scientific and patent literature. You need information from the scientific literature to get some idea of what the disease connection really is, information from the patent literature and the web to find out what the competitive situation is, things from the chemical literature to find out about what molecules have been designed to inhibit the protein, and to understand the business case around a particular potential drug target. I would guess that around 80 percent of the information you need isn’t in structure databases — it’s out in the text in articles in the library and in patent databases.
So far, the efforts to get that information out of text and into a form where you can process it automatically and put it in front of people quickly have been fairly rudimentary. There’s a lot of interest here and I expect things to ramp up quickly. I think there will be products here soon. We’re mounting an effort to not only keep a close eye on what’s going on out there and find the best collaborators, but also probably do some internal work.
Do you build your own analytical tools?
That situation is a lot better than it was a few years ago. I don’t think there’s much point in building one’s own gene structure prediction tools or domain discovery tools or sequence alignment tools. That area of bioinformatics is mature and you can get pretty good tools from outside.
In these areas that are mature, do your researchers tend to opt for commercial solutions or publicly available tools?
I don’t know if there’s a general rule. We certainly buy some software and take what’s available publicly. The cost is not as much of a factor as the quality of the software. We want state-of-the-art software and we’ll get it wherever we can.
How about collaborations with external bioinformatics groups? Do you bring in bioinformatics research expertise as well as tools?
We’re certainly open to that. We are a large company and have a fairly large bioinformatics group, and between the tools and the databases we import and the expertise that we have in-house, we’re in pretty good shape. We do hire consultants, and we do have collaborations with academic groups, but for 90 percent of our immediate need, we can do it with our own expertise.
What would you say the biggest challenges are in your job right now?
My top priority at the moment is hiring a group leader. The situation for hiring bioinformatics scientists is much better than it was a few years ago. You can now find trained bioinformatics people, but it’s still quite difficult to find experienced managers.
On the technical side, the completion of the genome has really changed things. A few years ago, all the effort at pharmaceutical companies was directed toward a couple of hundred drug targets, total. There were only a couple of hundred proteins that pharmaceutical companies were interested in. As the genomic information started coming out over the last few years, the focus of bioinformatics turned at least partly to finding new and interesting genes in all that data that was pouring out. A lot of those targets were jumped on quickly and went nowhere. A few of them panned out. But now the situation’s quite different. We’ve got the whole genome, we’ve got 30,000 or 40,000 genes. And what’s happened now is there’s this very large set of potential targets and a very complex information picture that is constantly evolving around each one of those. How do you figure out which ones are at the top of the list? That’s quite a difficult problem.
There have been tools evolved over decades for keeping track of the progress of projects further down the line in discovery, where the target’s well-established and people are worried about the chemistry or around the regulatory process. There is not in place any infrastructure for decision support right at the beginning of the pipeline to keep track of all this information around all the possibly interesting targets and to keep your priorities straight, so that the proteins you’re working on are really the ones that are most likely to be the most valuable in the long run.
That’s a very interesting challenge that we’re working quite hard on and where we’ll probably expand our efforts in the future.
It seems many people in the field are wondering what to do with all the targets identified through genomics right now.
It’s a tough challenge. In the past, people mostly worked on highly validated targets. You knew they had an important role in disease. So traditionally, the focus has been on the chemistry. Now that our understanding of pathology at the molecular level is going forward so rapidly, and there’s a large set of possibly interesting genes, we really need to take a different approach.
How are you handling that challenge at AstraZeneca?
We’re still working out how to handle it. Generally speaking, I would like to take the different key factors in what makes a good target — it needs to be assayable, it needs to be connected to the disease in a key way, there needs to be a history of finding small molecules that do inhibit proteins in the same family, there needs to be a market for the disease — you could write down about 20 key factors that go into the decision of what makes a target. We’re beginning to look at how we can gather information automatically about all these facets of the decision, and then try and look at that data manually in the cases that are of most interest to the company.