European researchers have created a version of Google's PageRank algorithm — dubbed NetRank — that uses both gene expression levels and networks of relationships between gene products to rank proteins' relevance to the progression of cancer.
In a paper describing the algorithm that was published recently in PLoS Computational Biology, the authors note that the network-based approach improves upon biomarker-discovery methods that rely only on gene expression because it "it can detect and therefore avoid markers that correlate with survival simply by chance or noisy measurements, but not due to an underlying biological causality."
According to the authors, after using the algorithm to rank about 20,000 genes from 30 pancreatic cancer patients based on their relevance to the disease, they were able to identify seven candidate biomarkers whose expression "reliably correlates with the patient survival time" and which could serve as a "molecular signature for reliable survival prediction."
These seven markers were then validated using immunohistochemistry in a separate cohort of 412 patients.
According to the researchers, analysis methods that correlate gene expression with survival times often identify biomarkers with "limited prediction accuracy, limited reproducibility, and unclear biological relevance."
NetRank's use of PageRank sets it apart from these approaches, however. Just as Google's algorithm uses "hyperlink information between web documents to better decide which documents are the most relevant ones" in response to a search request, NetRank "ranks genes according to their prognostic relevance" by coupling "gene expression measurements with a network of known relationships between the genes' products," the authors explained.
Stated another way, NetRank looks for correlations between gene expression and patient's survival time as well as for any neighboring genes that seem to play a role in the tumor's activities, Christof Winter, a postdoctoral researcher at Lund University's oncology department and the paper's first author, told BioInform.
Genes that appear that have a higher correlation to survival and whose neighbors are also associated with survival are therefor ranked higher than those that may be linked to survival but don't have the same degree of network association, he explained.
NetRank works by assigning a score for each gene that reflects "the absolute correlation of its mRNA expression level with the patient survival time." Next, "the network is then used to spread this correlation to [the gene's] neighbors and beyond," the paper explains. Genes with the highest NetRank score are selected as potential signature genes for further testing.
By including network information in the mix, "gene products with many interactions" are given "a higher biological relevance since they can exert a bigger influence on a biological system." These "network neighbors" help the algorithm "ignore correlations between expression and outcome that have no underlying biological causality," the paper explains.
One example described in the paper is the HBA1 gene, which encodes hemoglobin alpha protein. According to the researchers, NetRank found a strong negative correlation with survival based on gene expression levels alone, but validation tests did not support this assertion, leading the team to conclude that this was merely a chance correlation.
The paper also claims that NetRank was able to improve biomarker prediction accuracy by up to seven percent compared to methods like Pearson correlation when used with data obtained from 30 pancreatic patients.
Furthermore, the authors note that NetRank addresses two issues that are associated with finding cancer biomarkers: first, the process is difficult and time consuming and, second, markers found in different studies for the same types of cancer almost never overlap.
It also offers a more "objective" method than simply looking at gene expression levels and interaction data and trying to find patterns manually, which some researchers do, Winter told BioInform.
In their study, the researchers found that high expression of STAT3, FOR, and JUN genes was associated with shorter survival of patients and that high expression of SP1, CDX2, CEBPA, and BRCA1 genes was associated with better patient outcomes.
The team validated these biomarkers using real-time PCR to verify the microarray gene expression measurements, as well as immunohistochemical analysis of protein levels in data from 412 patients, about half of whom had received adjuvant therapy.
From the seven-marker set, the authors derived "a six-gene signature for patients with adjuvant therapy and a five-gene signature for patients without adjuvant therapy," the paper states.
The researchers further claim that these signatures were more accurate than traditional clinical parameters, such as tumor size, distant metastasis, and histological grade.
In one scenario, the investigators compared the predictive value of the biomarkers found by NetRank combined with clinical parameters versus clinical parameters alone in patients from the validation cohort who had been treated with chemotherapy.
The team found that the additional predictive value of the signature compared to clinical parameters was nine percent — a calculation based on a prediction accuracy of 70 percent for NetRank plus clinical parameters and 61 percent for clinical parameters alone.
Although the addition of network information appears to improve biomarker prediction accuracy, the paper does note that one potential drawback of NetRank's approach is its bias towards genes with many known connections against those whose associations are not so well characterized in the literature.
Moving forward, the researchers are looking to collaborate with Dresden-based biotech RESprotect, as well as other interested groups, to put these biomarkers through further testing, Winter said.
Once they have been validated in follow-up studies, the team believes that the seven biomarkers could be used to develop new treatments and tests as well as to help guide therapy decisions for patients with pancreatic cancer.