Stanford University School of Medicine
At A Glance
Name: Zachary Pincus
Position: Graduate student, biomedical informatics, Stanford University School of Medicine
At the International Society for Analytical Cytology meeting two weeks ago in Quebec City, Canada, the topic of high-content imaging for basic research and drug discovery was more prevalent than it had been at previous meetings. One workshop focused on image segmentation, and may have been of particular interest to high-content imaging specialists: Image segmentation must be done with a relatively high degree of accuracy to enable many of the other commonly deployed image-analysis routines in HCS.
Though workshop participants presented their approaches to the image segmentation problem, they often digressed into a lively discussion about how modern-day high-content imaging might be able to learn from existing computer science and medical-imaging techniques.
One presenter championing such an approach was Zach Pincus, a biomedical informatics graduate student who works in the lab of Julie Theriot, a professor in the departments of biochemistry; and microbiology & immunology at Stanford University.
Pincus, whose background is a blend of biology and computer science training, has developed a technique for segmenting individual cells in various types of microscopy images — work for which he was recognized with the ISAC President's Award for Excellence, which is given to outstanding young investigators in the field. Cell-Based Assay News caught up with Pincus after ISAC to discuss his work and how it might be applied to high-content imaging in both industrial and basic research applications.
At the ISAC workshop, you mentioned that people in biological imaging might learn from disciplines like medical imaging and technologies like artificial intelligence. How does this tie in with the work that you're doing?
What motivates me are not tools that super-high-throughput imaging endeavors can make use of. If you're running this giant project, then of course you can justify spending a lot of time customizing and tweaking a set of tools for whatever exact specifications you require. However, that leaves a lot of basic biology out in the cold, because it's just not that high-throughput. It's difficult to justify spending a lot of time tweaking automated tools to do the analyses that you like. You end up with a lot of biologists spending more time than they ought to drawing lines around things in Photoshop or Metamorph.
What got me interested in the field was trying to think of tools that play a lot more toward the computer's advantage, and using some of the results from recent machine learning, statistical learning, and medical imaging to make the computer a lot smarter, and concomitantly reduce the burden on the biologist by having tools that can learn some of the assumptions that you'd otherwise have to painstakingly encode into an analytic script. Think about someone's script that analyzes their particular images and, say, segments out the foreground from the background, et cetera: In this script there are all of these assumptions about what the imaging conditions are, what kinds of cells you are looking for, and all sorts of other things. These assumptions are totally brittle, so if you change the imaging conditions or the magnification, the script would completely break, and just by looking at the script, it's not clear where those assumptions are necessarily encoded, because they're all very implicit.
Our idea was, 'Can we encode these assumptions explicitly in statistical models?' — statistical models of imaging conditions; for instance, what foreground pixels versus background pixels look like, and statistical models of cell shape that anyone can make, because you can learn a statistical model from example — so if someone gives some sample pixel regions inside and outside of cells, and someone gives some sample cell shapes — that's all of the input you'd need to build a model. Hopefully your code could then be much more agnostic to the imaging type. This work, of course, has brought me in very close contact with the statistical learning and machine learning communities, as well as primary consumers of that work, who are people in robotics and image analysis.
It seems there is a 'chicken and egg' thing going on here: You mentioned high-throughput imaging with exact specifications for certain projects, but a lot of those capabilities stemmed from the type of work that you're doing. Couldn't the work you and others like you are doing eventually be applied to high-content, high-throughput imaging?
Absolutely. The situation right now — and I believe that this isn't going to change — is that if you want to do one thing, and do it really well, then a hand-tuned analysis script that's a sequence of pixel-pushing operations is going to be the winner for some time. But if you want to do a bunch of different things, and you don't want to spend the effort of doing the hand-tuning, the computer can help us a lot more than it does with current vendor-supplied software. You might say, 'For high-throughput imaging, it really is just doing one thing and doing it a lot,' but you're throwing away a lot of data if you have this enormous data set and you're just going to do one thing to it. I believe that flexible analysis tools are necessary for your basic biological microscopist, but I think that industrial and higher-throughput applications, in general, could benefit from tools that could do fairly sophisticated pilot analysis without a large investment in manpower.
If you look at the tools that are provided by most current microscopy software packages, and then look at where those tools come from — where they were first discussed in the image-processing literature — it was in the 1960s or 1970s. There has been a sea change in the sort of analyses that people in computer science, and especially in the image-analysis field, have been developing since that time. We're left with the question, 'Does the field bifurcate, and do microscopists continue to rely on 1960s-era technology; or, do we make sure that we maintain links between current image analysis and what tools we can borrow for microscopy?'
That was the major point I made at the workshop: If we don't want to get left behind — and that's a big if, because there are plenty of times when using older, proven stuff is a good idea — then we need to start thinking seriously about these statistical techniques that draw on machine learning, linear algebra, et cetera. And to get a sense of who is doing interesting, applied work with that, and work that's starting to impinge on industrial applications, we need to look a the medical image-analysis community. They've been avid consumers of state-of the-art image-analysis algorithms, and they have kept pace with the advances in image analysis. In medical image analysis, there are tools that are starting to get deployed in healthcare settings for actual patients. These people have a lot of experience in taking algorithms from computer science and serious image-analysis work and applying it [to their problems]. Those sorts of algorithms are, in general, driven by statistics, and explicit statistical models of what people are looking for, as opposed to being driven by sequences of commands that someone has cobbled together over time. Of course, it's somewhat biased for me to say that this is the way to go, because this is the work that I'm pursuing. This isn't the only way to move forward.
Many high-content screening vendors and users have been saying that new and novel image-analysis algorithms are going to drive the field forward, because the instrumentation is pretty good.
Yes. The take-home message from the work that I've been trying to do is that outside of the realm of high-throughput imaging, there is a lot of opportunity for sort of medium-throughput imaging, especially in terms of exploratory analysis, and ancillary analysis. You've got a data set, and you want to see what else is in that data set. In an exploratory setting, there are things the computer could be doing to make things easier for us. Even in the high-throughput setting, there are a lot of things computers could be doing to make things easier for us, but the performance might not quite be there. It might be absolutely reasonable to keep using things developed in the 60s, 70s, and 80s, because you can fly through 100,000 images a day with modern computers. But when you're trying to do an exploratory analysis, the immediate aim isn't to do that many images — it's to make a case for going ahead and really doing the high-throughput stuff. There is this whole second class of tools that seems to have been largely ignored, where performance is secondary, and the computer making it easy for people is primary. Perhaps I'm overstating the case — and this may not be of as much use in industry — but in the bench biology setting, having tools that can easily and quickly make initial low- and medium-throughput analyses, and that allow you to try 10 or 15 different hypotheses within a few days, would be extremely valuable.
ISAC didn't include a whole lot on image analysis, but did any other work in this area presented there make an impression on you?
Bartlomiej Rajwa, who is a research scientist in Paul Robinson's lab at Purdue — he and his colleagues have been fairly sophisticated users of machine learning and statistical classification techniques. This is for flow cytometry, but the sense is the same — given a data set that is really hard for people to handle properly, maybe we shouldn't be trying so hard to handle it properly; maybe we should be throwing it into the statistical meat grinder.
People in our field were well-represented at the workshop — Jelena Kovacevic [of Carnegie Mellon] has been really interested in actively using some recent techniques in image analysis and medical imaging to drive their cell segmentation tools [Kovacevic is working with Carnegie Mellon's Robert Murphy's to develop image-analysis techniques for the 'location proteomics' approach — Ed (see CBA News, 2/15/2005, for more)]
The workshop focused primarily on image segmentation because it makes everything else in high-content imaging possible. Are there other problems after that, though, that might also benefit from new approaches?
Sure. I don't know if this is recognized as a problem, but I wonder if it will be a problem in the future. The basic idea now is that you segment your images, and then you generate a bunch of summary statistics, kind of scattershot — some about cell shape, some about fluorescence distribution, maybe the texture of the fluorescence distribution or some sort of co-localization statistics. But almost at random, totally ad hoc and a priori, you generate a whole bunch of statistics about the various cellular images. Then you go back and ask if you can use those statistics to do some sort of discriminative analysis to find cells that look apoptotic, or what not. I wonder if there isn't a better way of using some of the more recent techniques in statistics to build models of what we're looking for — to say, 'OK, we don't know what stats we want to use, but we know what apoptotic and non-apoptotic cells look like. All right, computer, you tell us the best statistics.' And not just in a feature-selection sense, but giving the computer a bunch of raw pixels, and asking it to tell us the best way of making the measurements, to make sure we're not losing anything by scattershot feature extraction. I don't know that this is a problem, but it always seems a little unsettling to me that we would be taking pixels and then just grinding them up into these often atomic and meaningless features. Given 100 features, you can't always put back together what the cell really looks like. Maybe there is some data-adaptive way to make these measurements on the cell so we can get a lot more precision out of the tools. My fear is that in doing some of these feature decompositions, you may be losing statistical power and not even know it.