NEW YORK (GenomeWeb) – A team working to uncover biomarkers for early detection of ovarian cancer has developed a novel strategy to filter gene expression microarray data using a comprehensive list of proteins that are believed to be secreted into circulation — the "secretome" — in order to identify promising targets for follow-on proteomic studies.
The investigators used the technique in an initial proof of concept and published the results earlier this month in Clinical Cancer Research, in which they identified two novel secretome targets in array data, and then confirmed them to be elevated in the blood of women with cancer compared to those without.
"Traditionally, people have done this high throughput, large scale, brute force looking for proteins … and it dawned on us as the TCGA and other studies came through with all this genomic data — just gobs of it — that there must be a way to leverage that," Michael Birrer, the study's senior author and a lab director at Massachusetts General Hospital, told GenomeWeb this week. "Maybe not to identify the actual blood-based biomarker, but to enrich for potential target genes that encode proteins that are likely found in blood. That's what the secretome is."
In the study this month, Birrer and his colleagues described their development of a comprehensive secretome, essentially an amalgamation of several existing databases of proteins suspected to be secreted into circulation.
They then reported their method for using this database to create a virtual array, which, when mapped against existing gene expression data from Affymetrix human genome chips — could identify expression differences between cancer and normal samples that map specifically to proteins that are expected to be present in the blood.
Essentially, Birrer said, the virtual array provides a way to narrow the field from the thousands of proteins that are present in the blood overall, or the hundreds that appear elevated in cancer patients in broad shotgun proteomics studies, to just a handful of the most promising. This would allow researchers to bypass the need for broad profiling of precious samples in lieu of lower-throughput but more sensitive methods like multiplex ELISA, antibody array, or targeted mass spec.
"When you do these broad proteomic analyses, it's not like you don't get anything. You get too many things," Birrer said. "And in order to validate these [markers] in a meaningful way requires really precious specimens from patients who have been carefully, serially studied prior to developing a tumor. You don't want to waste that on 500 candidates. You wouldn't even be able to do it," he explained.
According to Birrer, while several databases of secreted proteins exist, none are themselves comprehensive, and many overlapped with each other, making the task of developing the secretome master list a somewhat challenging one.
In addition, developing a strategy to couple that final secretome resource with array data was also more challenging than simply mapping each secreted protein to a genetic locus, Birrer said.
"The first problem is that there is redundancy between the different databases, and the second is that one protein may have five names, so you have to sort that out. … Then you have to map the right protein to the right gene ID — and there's confusion even there because there may be several isoforms for the gene, and the question is which one does the protein really map to," he said.
One downside of this process is that it necessarily eliminates some complexity in the secretome that might actually be useful. "For example, there may be alternate splicing or isoforms that may be important … but that's too deep for us to put into this analysis," Birrer said.
In other ways, though, the creation of the secretome was fairly broad, he added. What we really didn't want to do was miss potential candidates so with the initial protein databases we [didn't just limit it to] secreted protein databases and signal peptide databases, but we even took transmembrane proteins … and tumor death proteins."
In the end, the group's secretome virtual array included a list of 16,521 Affymetrix probe sets representing the transcriptomic equivalents of the secretome member proteins.
Once this bioinformatic development was complete, Birrer and his colleagues went on to apply the method to expression data generated from advanced-stage serous ovarian cancer samples and compared that to normal ovarian cell and fallopian tube samples to see if any differentially expressed sites could be mapped to novel protein biomarkers.
To prioritize candidates, they also used pathway-based and tissue-based filtering approaches.
The results included some well-known and established biomarkers of serous ovarian cancer, including CA125 and HE4, as well as several novel markers, of which the group chose two, FGF18 and GPR172A, for further validation.
When they analyzed a second independent gene expression dataset focusing on these two markers, the researchers again found that both molecules were overexpressed in serous ovarian tumor samples compared to normal ovarian epithelial controls in a statistically significant manner.
The team also obtained a set of 20 ovarian cancer blood samples and 20 controls and used sandwich ELISAs to quantity levels of both FGF18 and GPR172A. As predicted by the secretome array data, the investigators saw that both were significantly increased in the ovarian cancer blood samples compared to the controls.
Birrer and his coauthors wrote that the results demonstrate the "potential value of the secretome array in translating genomic data into the discovery of blood-based biomarkers."
Though his lab is focused on ovarian cancer early detection, Birrer said that the secretome-Affymetrix array approach could clearly be applicable both to other cancers and to blood-based biomarker detection in other diseases.
And though he and his colleagues developed an analysis method specifically for Affymetrix gene expression array data, he said that the same approach could also be applied to RNAseq results and even whole-genome sequencing.
As part of its work as a member of the National Cancer Institute's Early Detection Research Network, Birrer's team is now using the secretome array to generate a larger list of promising potential ovarian cancer biomarkers and cross checking the results against those from broad proteomic profiling studies already underway by other members of the EDRN.
"We did all the secretome work and came up with our gene list, which we have in the paper, and then the Broad Institute has been doing proteomics and they came up with their list. The obvious thing to do is to cross check to see if there are markers in common," Birrer said. "We've started doing this, and lo and behold you do find a bunch coming from both directions, and we think they are the highest priority now to go to the next level."
He said that the groups have settled on a 200-protein candidate list and are having antibody assays created for the top 50 currently, which they plan to apply to plasma samples from ovarian cancer patients compared to controls.
"We expect at least some will show up as being at higher levels in the ovarian cancer patients … and then comes the real test, which is to take the ones that really look good and test them on the [United Kingdom Collaborative Trial of Ovarian Cancer Screening] specimens," Birrer said.
The first data from the UKCTOCS, in which 200,000 women were randomized for ovarian cancer screening, are expected to available in the next six months. Initial analysis has been focused on the protein CA125, which Birrer said he would be surprised to see yield a positive result.
Meanwhile, serially collected samples from the trial are available for about 200 women who eventually went on to develop ovarian cancer. Birrer and his colleagues plan to analyze these samples using the best candidates from their current secretome-based analyses with the hope that at least one can be seen rising in concentration prior to the detection of cancer in these study subjects.