New SAGE Informatics Toolkit Promises User-Friendly Resource for Expression Data


A new website from the NCI’s Cancer Genome Anatomy Project could broaden the user base for its SAGE (serial analysis of gene expression) data resource.

SAGE offers a quantitative and reproducible alternative to microarrays, but the higher costs associated with the sequencing-based process have limited its use. Unable to gather the large number of samples required to make SAGE data practical on their own, most researchers from individual labs rely on SAGEmap from CGAP and the NCBI or other, less comprehensive repositories of SAGE data.

However, according to CGAP collaborator Greg Riggins of Duke University Medical Center, SAGEmap leaves much to be desired, especially for inexperienced users or those looking to manipulate and process SAGE information. In particular, the resource lacks a means to view gene expression in an anatomical context. Seeking to create a more visual and intuitive tool, Riggins and collaborators from Duke, the São Paulo branch of the Ludwig Institute for Cancer Research, and the National Cancer Institute started from scratch and developed a second website called SAGE Genie (

SAGE Genie provides an automatic link between gene names and SAGE tags — short sequence tags of 10 to 14 base pairs that uniquely identify a transcript and are linked together to form long molecules that are then cloned and sequenced in the SAGE process. Expression levels are determined by observing the number of times a particular tag appears in the sequence. “One of the complaints I got consistently about SAGEmap was that it took a certain level of expertise in order to match your tag to your gene. What we’ve tried to do is automate that process,” he said.

This was accomplished by winnowing a set of more than 6.8 billion SAGE tags from 171 SAGE libraries down to 194,126 unique tags that were deemed “confident SAGE tags” (CSTs). Sandro de Souza of the Ludwig Institute led a team that then combined seven sources of cDNA sequences into 105 databases ranked by their CST list representation. The CGAP collaborators then wrote a software program to sort through the 105 databases for the best tag-to-gene matches.

In addition, SAGE Genie’s Anatomic Viewer provides users with a tissue-centric view of gene expression data. Expression levels in normal and malignant tissues are indicated with a color-coded system. Users can also view alternative transcripts, redundant tags, and internal priming with SAGE Genie.

Building a better SAGE data viewer won’t only benefit current users of SAGEmap, however. Riggins said that the enhancements might encourage more labs to use SAGE on their own, because they would only have to run the experiment on a single tissue type of interest and then compare it to the other tissues already available through CGAP.

Additionally, he said that many researchers use SAGE to select genes for custom arrays. Improved access to that data can only help that process as well, he said.

SAGE is freely available for academic labs through its inventors at Johns Hopkins University, but Genzyme Molecular Oncology holds an exclusive license for commercial sales of the technology. In addition, the company sells its own cancer-based SAGE database through Celera and Compugen. Does the improved usability of the public resource pose a threat to commercial sale of the data? No, said Antony Newton, Genzyme’s director of commercial development. The CGAP SAGE project is an “impressive resource,” he said, and only serves as further validation of the SAGE technology.

“There are things in our database that aren’t in theirs and vice versa, so it just gives you an extended array of data. And more is better [in SAGE],” Newton said.

— BT

