Researchers at Carnegie Mellon’s Lane Center for Computational Biology recently announced that they have discovered a way to speed up important steps in an automated method for analyzing images of cell cultures.
According to the group’s findings, which appeared last month in the Journal of Machine Learning Research, these methods will yield better results from analyzing images in cell-based assays, Robert Murphy, an author on the paper and the director of the Lane Center for Computational Biology, told CBA News this week.
He added that the method would enable researchers to obtain a more accurate picture of what is happening in their assays, “meaning better accuracy for the classification of individual cells and a better conclusion about what is happening.”
According to Murphy, the analysis can be added on top of an existing high-throughput screen. “The traditional way of analyzing such a screen was to look at each individual cell, and then be done,” he said. “This algorithm adds an extra layer where you can get better accuracy in terms of the classification of each individual cell, in cases where their classification may be confused.”
Many times and in many situations, whether in cultured cells or in tissues, fields or images of cells can contain heterogeneity in cellular patterns. “The question becomes, ‘What does one do about this heterogeneity, if you are trying to do an analysis of the patterns in that field?’” said Murphy.
The standard way of dealing with this question is to classify each individual cell, and then “to let them vote, so to speak, by counting up how many individuals comprise each class of interest, and that majority will give the final decision.”
One can assign that field a label based on that voting or one can assign a condition based on voting of all the various images in that field. “The bottom line is that you end up with just a single answer as essentially the only way to combine information from multiple cells,” Murphy said.
“That was the starting point for our work, and the idea was to do better than such voting, because there might be many cases where the presence of a small percentage of some other pattern might, in fact, be very significant,” he added.
“The fundamental idea is that neighboring cells are more likely to be of the same pattern than cells that are further apart.”
In addition, Murphy said, scientists may have a situation in which there is heterogeneity within a field, but their classifier may have difficulty distinguishing some of the classes that they are interested in.
As a result, he said, the question then becomes, “‘How do you know when the classifier is making mistakes in distinguishing these classes, and in fact, you have a mixture of two different patterns?’”
Each of the cells gets classified, and then “we build a graph that connects each of the neighboring cells,” Murphy said. “The fundamental idea is that neighboring cells are more likely to be of the same pattern than cells that are further apart.”
The method that is described in the paper addresses the problem of a classifier that has difficulty distinguishing among classes of cells. Murphy said the steps are to classify each individual cell, then to construct a graph that connects the cells that are near each other.
“For each node in that graph, count the neighbors of that node, with each cell being a node in that graph, and for each cell, count the classes of its neighbors, and use the information to push that particular cell towards the class comprising the majority of its neighbors,” he explained.
This method can be a little complicated, he conceded, because when “I am counting up the neighbors, the initial guess I made as to what the class of each neighbor might be could be wrong. What that says is [that] I should count the neighbors, change the class, and do it all over again, because each of the cells might have changed their class, and that would change the result in the next round,” said Murphy.
This inference is done iteratively until every cell converges, meaning that their pattern does not change.
His team has previously published three papers relevant to this work, in BMC Bioinformatics, Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), and Proceedings of the 2006 IEEE International Symposium on Biomedical Imaging (ISBI).
The first paper describes how to do this inference in a field with multiple cells while the others focus more on the cell results. The current paper, by contrast, focuses more on the algorithm.
Murphy said that all three papers demonstrate that the approach allows researchers to get better classification than if they just classify each individual cell and assume that is correct.
“The main event in this particular paper is describing how to do that inferencing quickly, because if you have a field of many cells, then the graph that you construct is very complicated and has many nodes,” he said. “That can take computers a long time to cycle through all of the cells and figure out how they are affecting each other, and for large fields, that becomes computationally intractable.”
The paper shows how to do that particular analysis in a reasonable amount of time,” said Murphy. “So one can do this analysis of a field and take into account the patterns of each of the neighboring cells, and do that quickly enough so that it is “practical enough to actually do.”
Murphy said that he and his colleagues have no plans to patent this algorithm because “we have described it completely in this paper, so we are encouraging people to use the methodology for their needs. The main advance described in this paper is the algorithm that is used to do this inferencing.”
The Carnegie Mellon group is about to submit another paper, said Murphy. These calculations were done using images where the class of each cell was known, because “we created synthetic images by mixing together cells whose patterns we already knew.”
If an automated classification it says that a particular cell is a member of a particular class, “how do you know if that is correct or not?”
Normally, investigators know that their classification of a cell is correct because they labeled it with their specific marker of interest, said Murphy. “If you have a mixed field of cells, however, you do not know which cell is of which class.”
In the studies described in this paper, Murphy’s team addressed that by “just synthetically, using the computer, merging the images of different cells where we knew what the image was, and we could test whether the algorithm would work.”
In the paper they are about to submit, Murphy said that he and his colleagues labeled one of the populations of cells in a different color so that “we could tell which class it was supposed to be independently.”
That was kind of the “secret truth, which we do not let the computer know, and we asked whether or not the computer got it right.” That demonstrates a more direct application to a real world situation where researchers do not know what the proper class is.