Skip to main content
Premium Trial:

Request an Annual Quote

New SAGE Informatics Toolkit Promises User-Friendly Resource for Expression Data


A new website from the NCI’s Cancer Genome Anatomy Project could broaden the user base for its SAGE (serial analysis of gene expression) data resource.

SAGE offers a quantitative and reproducible alternative to microarrays, but the higher costs associated with the sequencing-based process have limited its use. Unable to gather the large number of samples required to make SAGE data practical on their own, most researchers from individual labs rely on SAGEmap from CGAP and the NCBI or other, less comprehensive repositories of SAGE data.

However, according to CGAP collaborator Greg Riggins of Duke University Medical Center, SAGEmap leaves much to be desired, especially for inexperienced users or those looking to manipulate and process SAGE information. In particular, the resource lacks a means to view gene expression in an anatomical context. Seeking to create a more visual and intuitive tool, Riggins and collaborators from Duke, the São Paulo branch of the Ludwig Institute for Cancer Research, and the National Cancer Institute started from scratch and developed a second website called SAGE Genie (

SAGE Genie provides an automatic link between gene names and SAGE tags — short sequence tags of 10 to 14 base pairs that uniquely identify a transcript and are linked together to form long molecules that are then cloned and sequenced in the SAGE process. Expression levels are determined by observing the number of times a particular tag appears in the sequence. “One of the complaints I got consistently about SAGEmap was that it took a certain level of expertise in order to match your tag to your gene. What we’ve tried to do is automate that process,” he said.

This was accomplished by winnowing a set of more than 6.8 billion SAGE tags from 171 SAGE libraries down to 194,126 unique tags that were deemed “confident SAGE tags” (CSTs). Sandro de Souza of the Ludwig Institute led a team that then combined seven sources of cDNA sequences into 105 databases ranked by their CST list representation. The CGAP collaborators then wrote a software program to sort through the 105 databases for the best tag-to-gene matches.

In addition, SAGE Genie’s Anatomic Viewer provides users with a tissue-centric view of gene expression data. Expression levels in normal and malignant tissues are indicated with a color-coded system. Users can also view alternative transcripts, redundant tags, and internal priming with SAGE Genie.

Building a better SAGE data viewer won’t only benefit current users of SAGEmap, however. Riggins said that the enhancements might encourage more labs to use SAGE on their own, because they would only have to run the experiment on a single tissue type of interest and then compare it to the other tissues already available through CGAP.

Additionally, he said that many researchers use SAGE to select genes for custom arrays. Improved access to that data can only help that process as well, he said.

SAGE is freely available for academic labs through its inventors at Johns Hopkins University, but Genzyme Molecular Oncology holds an exclusive license for commercial sales of the technology. In addition, the company sells its own cancer-based SAGE database through Celera and Compugen. Does the improved usability of the public resource pose a threat to commercial sale of the data? No, said Antony Newton, Genzyme’s director of commercial development. The CGAP SAGE project is an “impressive resource,” he said, and only serves as further validation of the SAGE technology.

“There are things in our database that aren’t in theirs and vice versa, so it just gives you an extended array of data. And more is better [in SAGE],” Newton said.

— BT

Filed under

The Scan

Enzyme Involved in Lipid Metabolism Linked to Mutational Signatures

In Nature Genetics, a Wellcome Sanger Institute-led team found that APOBEC1 may contribute to the development of the SBS2 and SBS13 mutational signatures in the small intestine.

Family Genetic Risk Score Linked to Diagnostic Trajectory in Psychiatric Disorders

Researchers in JAMA Psychiatry find ties between high or low family genetic risk scores and diagnostic stability or change in four major psychiatric disorders over time.

Study Questions Existence of Fetal Microbiome

A study appearing in Nature this week suggests that the reported fetal microbiome might be the result of sample contamination.

Fruit Fly Study Explores Gut Microbiome Effects on Circadian Rhythm

With gut microbiome and gene expression experiments, researchers in PNAS see signs that the microbiome contributes to circadian rhythm synchronicity and stability in fruit flies.