Skip to main content
Premium Trial:

Request an Annual Quote

Stanford Team Uses Machine Learning to Find New Indicators of Breast Cancer Survival from Image Data


By Uduak Grace Thomas

Researchers at Stanford University have developed a computerized approach that predicts breast cancer prognosis from tissue microarray image data by identifying known markers of the disease as well as a set of features that aren’t currently used by pathologists to determine survival rates.

Besides providing a more objective approach for making treatment decisions, the tool, named Computational Pathologist, or C-Path, could also be used to stratify patients into different categories for clinical trials, the developers said.

The development team, comprised of computer scientists and pathologists at the Stanford School of Engineering and Stanford School of Medicine, described C-Path in a paper published this week in Science Translational Medicine.

In the paper, the researchers explain that while pathologists typically make assessments about cancer’s progress based on a set of structural features in breast cancer epithelial cells, C-Path measures 6,642 features in images of both epithelium and surrounding stroma cells, which the team discovered were important in predicting patient survival.

In fact, a model based on features of the stroma "was a stronger predictor of outcome than one built exclusively from features of epithelial cells,” Andrew Beck, a doctoral candidate in biomedical informatics and the paper’s first author, said in a statement. “The stromal model was as predictive as the model built from both stromal and epithelial features.”

C-Path improves on current methods for determining the severity of breast cancer, in which pathologists typically look for specific cellular structures in microscope images and apply a scoring scheme — “an approach which has considerable variability,” the authors note in the paper.

Typically, pathologists look at what percentage of the tumor is comprised of tube-like cells; the diversity of the nuclei in the outermost cells of the tumor; and the frequency with which those cells divide, the scientists explained. These factors are then scored to determine survival rates and appropriate courses of therapy.

But, while these features provide useful information, “tumors contain innumerable additional features, whose clinical significance has not previously been evaluated,” Beck pointed out in a statement.

“We wanted to take a much broader view of the types of features that could be quantified on an image of cancer,” he told BioInform.

C-Path “strips away [the pathologist's] bias and looks at thousands of factors to determine which matter most in predicting survival,” Daphne Koller, a professor of computer science at Stanford and one of the paper’s authors, said in a statement.

But C-Path won’t replace pathologists, Matt van de Rijn, a professor of pathology and co-author of the study, pointed out. Rather, “we’re looking at a future where computers and humans collaborate to improve results for patients across the world,” he said.

A Customized Pipeline

According to the Science Translational Medicine paper, the researchers trained and tested the method on breast cancer tissue microarray samples taken from two separate cohorts — 248 patients from the Netherlands Cancer Institute and 328 patients from Vancouver General Hospital.

C-Path has a customized image processing pipeline that the team built on Definiens' Developer XD image analysis environment.

The researchers explain in the paper that the first step in the process was to break down tissue images into “superpixels” that were color-coded as stroma or epithelial cells. These superpixels and other measurements were then fed into a machine-learning algorithm that was used to build a classifier that distinguishes between the two types of cells.

Once the images were classified as either stroma or epithelium, objects within the cells such as nuclei and cytoplasm are then further color-coded.

Next, the researchers calculated several features for each color-coded object — such as an object’s distance in relation to its neighbor — to generate the approximately 6,000 characteristics that C-Path takes into account when it builds its prognostic models.

When the method was applied to control data to predict patient survival rates after five years, patients that it classified as high risk “showed significantly worse overall survival than cases predicted to be low risk,” the researchers said.

While such classifiers are common for genomic data sets, the authors noted that such an "unbiased data-driven approach" has so far not been used "in the study of cancer morphology from microscopic images of patient samples."

And while several genomic tests exist to predict breast cancer prognosis, such as Genomic Health's Oncotype DX and Agendia's MammaPrint, the authors note that additional information can be gleaned from microscopic images of cancer samples, which offer a "level of resolution [that] facilitates the detailed quantitative assessment of cancer cells’ relationships with each other, with normal cells, and with the tumor microenvironment."

C-Path’s framework could be used to develop prognostic models for other types of cancers, Beck told BioInform. In fact, the team is considering using it on datasets from patients with lymphoma and sarcoma, he said.

Since the method is built on Definiens' software, pathologists would need to have the package installed in their laboratories in order to use C-Path, he said. The company has already incorporated some of the tool’s features into its software although there are no plans yet to commercialize the technology, Beck said.

Furthermore, users would need to retrain the epithelium/stromal classifier to work on their image datasets before they can use it, although the group is working on improving the method so that this step won’t be required, he said.

The team also plans to extend C-Path to work on whole slide images, as well as to integrate patients' genomic information to try to discover new genotype-phenotype connections, Beck added.

Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.

Filed under