At A Glance
Name: Mike Slack
Position: Lead Engineer, BAE Systems Advanced Information Technologies (formerly Alphatech)
Background: Senior Scientific Programmer, Harvard University, Bauer Center for Genomics Research — 2002-2003; Associate Professor, mathematics, Western Michigan University — 1990-2001; Senior Scientist/Software Engineer, Click4More — 1998-2002; PhD, mathematics, University of California, San Diego — 1990
Academic institutions are playing an increasingly large role in drug-discovery research. One such institution is Harvard University, particularly its Institute of Chemistry and Cell Biology (ICCB) and Bauer Center for Genomics Research, which recently published a collaborative paper in Science [2004 Nov 12; 306(5699): 1194-8] describing a technique for multidimensional drug profiling in cells using automated microscopy. Mike Slack, one of the mathematicians on that team, took a few moments last week to discuss with Inside Bioassays the challenges involved in the project and the technique’s commercial potential.
I understand you are a mathematician. How did you end up getting involved in this project?
Steve [Altschuler] and I were roommates in graduate school. And I had been a professor in mathematics for about 11 years, and then I left to work in a startup software company, and I did that for maybe a year and a half or so. That was around the time when a lot of those companies were going bust, so I found myself looking for work. I was living in San Diego, and Steve called me just to say hello one day, and we had actually talked a little bit before about possibly getting me interested in working in biology. Actually, there was a guy he worked with at Rosetta [Inpharmatics], and I had done an informal interview with that company a few years before I went to Harvard, but I decided against that at the time. But it did pique my interest a little bit. So when Steve called me [recently] and I was looking for work, I hate to admit I was a little desperate, so I was willing to leave my wife in San Diego and go to Boston, and I was willing to commit to them for at least six months to see how it went. That’s how it got started.
When I got out there, my knowledge of biology was pretty minimal. [Altschuler’s group] had had some conversations with Tim Mitchison (see Inside Bioassays, 5/25/2004) about this project, and really liked Tim and thought it might be a good match for me to work with him. So we met and talked about the project with Yan Feng, who was a fellow at ICCB that had generated the initial set of data. It was mostly just coincidental in a lot of respects, but I had such high respect for Steve and Lani [Wu], and I had always wanted to work with them, so it kind of came at the right time.
What was the actual problem that needed to be solved, mathematically speaking, for this drug-profiling project?
I would say that in a lot of ways it’s really a combination of things. First off, there is just a whole lot of data to deal with. I don’t know if that’s really a mathematical problem, but more of a data-processing problem. Our approach was that you have these individual cells, and you’re taking measurements on these cells — maybe 100 measurements per cell or so. What we wanted to be able to do was capture the multidimensional aspect of those measurements. If you just look at a single measurement like nuclear size or something, well, that might not tell you very much by itself. But the combination of the multi-dimensional aspect, and the fact that we’re looking at millions or billions of individual cells, we somehow had to coalesce all that information. Statistically speaking, there are a number of issues that come up. The first issue is that for any given measurement, you have no idea really what the distribution of that measurement should look like for normal cells. There’s no reason to expect it to be any known distribution. So we had to come up with statistical methods — non-parametric statistics is the technical term — but really it just means you don’t have a clue of what this distribution should look like. So we had to deal with that, then we had to deal with the fact that the variance in the measurements is going to be different for all the different kinds of measurements — the distributions are going to look different. You have to find some way of normalizing everything. For instance, if I look at nuclear size versus the intensity of some protein marker, I want to be able to compare those on a level playing field. We had to come up with a way of normalizing the response across any sub-population of the cells. Once we did that, the other major thing was that because we were looking at dose response, generally speaking, we didn’t want to have our measurements depend on what the concentration was of the compound with which we treated the cells. We had to do the analysis in a titration-invariant manner. Once you have these normalized measurements, you can just think of the whole list of those as one big feature vector for a compound. If you just compare those in the standard ways, they may not match up because one compound may have a first real effect at one concentration, while another compound does at a different concentration. You have to be able to shift those around so they match up, and do that in a way that the algorithm basically picks it out for you.
All of those things were important. We used a lot of fairly standard tools, I think, but in slightly different ways than would normally be expected.
To your knowledge, is this sort of analytical or statistical approach something that has been used a lot by drug discovery or pharma companies?
I wish I knew, actually. I don’t have a lot of experience in speaking with people from drug companies, so I can’t really speak about what they do in this respect. What I’ve been told is that there is some degree that’s being used. I think the difference between what’s probably typical there, and what we did, is probably that we really went the extra mile to combine information in a truly multidimensional fashion. Not only that, but when we compare distributions, we look at the whole distribution, and not just the mean or the median, or quartiles, or things like that, which is often the case. That’s definitely one of the advantages of being able to do this kind of individual cellular profiling versus something like microarrays or something, where really the best you can hope to do is find an average of something.
This type of analysis, which seems to fall under high-content screening, is definitely moving towards live cells. This is something that was discussed in the paper as a future improvement. Would this approach be applicable, and would any modifications need to be made to do this?
I definitely think it will be applicable. The analysis techniques really apply in any situation where you could take a population of a bunch of populations of cells that have been perturbed in any way, and you’ve got some way of measuring some characteristics of those cells. Your choice of what those characteristics are doesn’t really matter, whether they are fluorescent-tagged proteins, or morphological characteristics, or texture-based things. But it’s pretty limitless in terms of the kinds of things it would apply to. Certainly we’re very interested in live-cell imaging. I just got an e-mail today from Steve [Altschuler], forwarded from someone whose company does that sort of thing, and would be interested in applying these techniques. It would very likely be the case that we could do a lot better than we did in this paper in terms of the kinds of results that you could produce. For example, we only looked at one cell line, which is fairly limited, and we also only looked at one time point, which is also fairly limited. There’s no reason why we would need to restrict ourselves. The only reason we did in the paper was that it was already challenging enough to get the thing done with those simplifications, and we wanted to at least be able to prove to ourselves that this would work.
Do you find that the instrumentation is limiting in any way, in terms of the kind of data that you want to collect?
It was actually rather surprising to us. There were a number of things about that setup that were not ideal. We had some technical challenges with the microscopes, and just the volume of data that we were trying to collect, and in a sense, we had limited resources. Some of the data was a bit noisy. We had wells that came in out of focus, and things like that. There were all sorts of problems that we encountered, but we were amazed at the degree to which the analysis techniques were fairly robust, and that we were able to produce fairly reasonable results in spite of all those limitations. So my guess is that as time goes on and people start looking at this more carefully, and we begin to refine the techniques, it’s going to get a lot better. The results from this particular paper are probably pretty poor compared to what’s possible. I can’t guarantee that, but I would bet on it.
The ICCB, for an academic setting, is a pretty high standard, but compared to what’s available in the commercial sector, I think we’re probably fairly low in terms of resources.
What are the immediate next steps for your lab regarding this research?
There are a couple of things. There has been a lot of talk about next projects that could be taken on, and Steve could give a better idea of what’s going on, although I’m sure he would want to play it a little close to the cup — we don’t want to necessarily give away all our ideas. But basically there have been a number of different collaborations we’ve talked about getting into. The common theme in many of those is that we want to head towards getting much better techniques for reconstructing system information, to understand particular pathways or to get to the systems-level understanding. In terms of the analysis we’ve got, we’re starting to cook up a few ideas of how we might improve the analysis to go to that level. I think also there are going to be some infrastructure-level challenges. How do we manage, store, move around, and analyze the data more efficiently? We really want to take on all these projects, and we’re not going to be able to succeed unless we can improve on that end of the scale. So we’ve got some software infrastructure for that. The third area is really in the image processing area. We did some pretty primitive things in image processing. In terms of what is state-of-the-art, what we did was really kind of laughable, almost. But again, one of the things we wanted to see was if we could do it on a primitive level. There are things that we want to do like bringing in live imaging, and using more sophisticated image-processing software. There will be a lot of new things to do in different directions, and one of the challenges is going to be figuring out how to parse out everyone’s time for that. But I can tell you that there has not been a shortage of interesting new directions in terms of the kinds of problems that could be worked on. I probably shouldn’t go into much more. But we haven’t really talked too much about promoting the drug-discovery aspect. Our lab is more interested in research directions, and we figure sooner or later somebody else will pick up the drug-discovery direction and go with that. Personally, I’m very interested in doing whatever I can to enable the commercial sector to take what we did and make good use of it if that’s of interest to them, so they should feel free to contact us. We can probably be creative about how that would be done. It’s kind of funny. I think Harvard applied for some patents on this work, and the attitude was: Let’s do that so nobody else can stop us from doing this research. But I think we’re very interested in trying to give it away in terms of whatever people want to try and do with it.