Sciences Institute, Berkeley, Calif.
Name: Andrew Gordon
Position: Research fellow, the Molecular Sciences Institute, Berkeley, Calif., 2001-present
Background: Network architect, Lucent Technologies, 2000-1001; Postdoc, biophysics, NYU Medical Center, 1999-2000; PhD, particle physics, Harvard University, 1998
A group from the Molecular Sciences Institute in Berkeley, Calif., has published a paper describing how it used homemade, open-source imaging software and automated microscopy to measure individual fluorophore dynamics, as well as protein and mRNA degradation rates, in single yeast cells.
The image-analysis software, called Cell-ID, takes advantage of specific features in brightfield images to identify and segment individual cells, and subsequently uses this information to further analyze specific cellular characteristics from fluorescence images of the cells.
The work, which appears in the February 2007 issue of Nature Methods, is yet another example of cell biologists with programming skills developing image-analysis software for the added flexibility it provides their specific research applications.
In addition, the research, which used Molecular Devices’ MetaMorph software as a building block, demonstrates how open-source and commercial image-analysis offerings can coexist, especially in an academic research environment and if commercial packages offer some degree of flexibility.
This week, CBA News caught up with the first author on the Nature Methods paper, Andrew Gordon, to discuss his lab’s work and his thoughts on open-source versus commercial image-analysis software.
How did you become interested in developing open-source image-analysis software for cell biology?
We were looking at the cell-to-cell variation in the expression of a fluorescent protein reporter. We just had a bunch of images, and the obvious first step is to try and extract those into the computer. That was part of the project, that we were taking these images to get cell-to-cell variation. My concern initially was that, although the obvious way to get the cells from the images is to look at the fluorescence image, I didn’t like that because it kind of biases you toward brighter cells. If you’re trying to get a quantitative measurement of cell-to-cell variation in some feature, you really shouldn’t be selecting [the cells] on that feature. And also, the cells look bigger as they get brighter. There are ways to get around this, but I didn’t want to deal with it. I wanted to use a brightfield image to find the cells. It’s not a particularly hard problem, but it wasn’t something you could just tell the computer to do. We had software that controlled the microscope, but it was hard to get it to do what you wanted in terms of data analysis.
Once you’ve written that, and you have all the single-cell data and all the pixels and all the information about it, you’re kind of free to do what you want, and that’s the position we wanted to be in – to do what we wanted with the data.
Are there other examples out there of identifying cells in images without using fluorescence?
Yes, definitely. This problem of having a brightfield image and trying to find cells in it is pretty well studied. Biologists do it in different ways, and mathematically, people have written papers on 10 different ways to do this. I can think of three other ways I would have also liked to have tried. The advance wasn’t that we were able to find the cells. I think the advance was that we were able to find the cells in the context of a biology lab, where the people finding the cells were also doing the biology. We had a particular scientific purpose, and took it from there. It’s hard to explain how useful it is to have this code of our own. There are a lot of features you look at in the cell, and then you start thinking about ways to measure them – just even finding the membrane. It turned out that the membrane I found using the brightfield image corresponded pretty well to the membrane in the fluorescence image. But something like that, where you want to know the membrane fluorescence, if you have a commercial package geared toward doing that, you don’t really know what it’s doing, and you can’t really tune it or test it. Here, you can really try different ways of doing it and see which works best for you. Even if that doesn’t get published, it’s a path you kind of had to go down anyway to explore these things.
Do you think there is a divide between what’s available commercially and what researchers with some programming ability might want for their specific study?
It depends on what you want to do. If you’re doing something simple, like just counting cells, or if you’re not really interested in a quantitative measurement of the fluorescence, it’s probably fine to just use something that looks at the fluorescence and gives you the position. Also, things like MetaMorph have packages that you can download, like the CellProfiler [open-source software], which do a good job.
So is there a divide? I don’t know. If you’re really tied to the graphical interface, you’re really limited in what you can do, and you may not know it, because you might not think of things if you just aren’t able to do it. I used to be in particle physics, and there, everybody writes their own code, and it’s very versatile in terms of programming. You would never use a commercial package to analyze the data. It seems to me that if you are really trying to do a quantitative measurement, you have to know what your data-analysis program is doing at every step. For instance, I didn’t try to deconvolve the data, because deconvolution adds a lot of things to it, and I didn’t want to do that.
It seems like the commercial packages more recently have been driven by the uptake of image-based screening in high-throughput drug discovery, and therefore would strive to be easier to use but maybe not as flexible. On the flip side, do you think that what you or other open-source developers are doing might find use in an industrial setting?
That makes sense that it would be driven by commercial applications. I would say academically, I think it’s very useful because scientists should be writing their own code. They’re looking for small things that might not be serviced by a commercial package. It seems to me that the price of a programmer for a company isn’t that high compared to the cost of these software packages. Should they be doing it? I don’t know. It depends on what they’re looking for. If you have a big company gearing their whole product towards you, then you probably don’t need to hire people to code it yourself.
For our data, we see different things. For instance, we would look for the amount of fluorescence as a function of annular regions toward the middle of the cell. I doubt any commercial package would create an algorithm to do that, but it’s not a hard algorithm to do. Just the ability to do it is very useful because you can kind of test things. A lot of the commercial packages do have the ability to add your own features to it.
But the reason I mentioned physics is that everybody is kind of literate, programming-wise, and this isn’t really the case in biology. There are definitely labs that do really good stuff, but not every biologist knows how to write a program. That’s part of the reason you end up relying more on commercial software. For whatever reason [biology] hasn’t evolved to include programming as one of the skills, but maybe it will in the next 10 years.
In the Nature Methods paper, besides your Cell-ID software, you used some other software for auto-focus and for data analysis, correct?
Yes. MetaMorph comes with kind of an auto-focus routine. But it wasn’t doing what we wanted – it wasn’t finding the correct focus. I suspect that the reason is the contrast looks different in a brightfield image than in a fluorescence image. An in-focus fluorescence image has maximum contrast, and I think the brightfield image has a minimum contrast. The thing was kind of bouncing back and forth between being above focus and below focus. We just wrote our own auto-focus routine to stick in there. But that was done in the context of MetaMorph.
That seems to be a strong example of commercial software coexisting with scientists who want to do their own programming.
That’s right. The commercial software – a lot of them give you hooks into their code, which is very useful. For instance, you get this Visual Basic routine, and you can have access to all the MetaMorph data, basically. You can communicate back and forth with MetaMorph as its running. The commercial software definitely has its place. There are people who are writing drivers so you don’t have to pay [Molecular Devices], but I was happy to use MetaMorph for this.
I’ll give you another example. We have a plate, and we want to do five images per well in the plate, and then skip to this well and come back over here – it can be done in MetaMorph, but it’s a little bit clunky, the graphical interface they have. But in programming language it seems much easier to set that up and edit it. In that sense I think that if the commercial package is not geared exactly toward what you want, you can get around it.
You used an off-the-shelf microscope and other components to acquire images as opposed to a commercial automated microscope. Why?
It kind of was a microscope in a box. It has a moveable stage and filters, but we didn’t machine those ourselves. And MetaMorph controls all of that fine. There’s another group [at the University of California, San Francisco] called MicroManager, which is trying to come up with drivers for all these things – basically creating an open-source microscope driver so you wouldn’t have to pay [for] MetaMorph. If every lab is buying commercial software, and it’s all doing the same thing for every lab, it’s almost like the NIH should just fund the group to do it. On the other hand, I kind of feel funny about the government competing with industry, so there are other issues there.
Is there any pressure or desire to commercialize your group’s software?
I wrote the codes for myself, so it’s not professionally written code. It’s not easy to understand. I could help people to understand it. Also, part of the problem with it is that the initial application was for cell-to-cell variation, but other applications arose and I kind of hacked the code to stick them in. To commercialize this, you would need a few months to make it look nice – the code itself, to really maintain it. I don’t really have an interest in commercializing this. I wouldn’t mind if someone gave me money for it, but I don’t want to make a career of this, to identify myself as the programmer of this code.
You have to remember that although we’re giving this away, I don’t give you any support for it. If somebody called and asked for help, I’m happy to help, but in terms of a company, I’m not giving them full-time support. From a corporate perspective, open source isn’t useful to them. They need to hire someone to maintain it and work with it. For scientists, the power of the commercial packages is not necessarily in what you want to do. If you’re really into the flexibility and exploring things you don’t expect to find, then the commercial packages might not be flexible enough. I’m not dumping on commercial providers, but it’s kind of an underserved market in a way.