Skip to main content
Premium Trial:

Request an Annual Quote

UConn Team Developing Database to Track Lineage of Stem Cells via Gene Expression Markers


By Jim Kozubek

University of Connecticut geneticist Craig Nelson and students have launched a database that uses biomarkers to track the lineage of stem cells into fully differentiated states.

The Stem Cell Lineage Database, the first resource to track continuous cell lineages with biomarkers alone, is designed to make use of minimal sets of gene expression profiles to trace the differentiation of stem cell lines into tissue-specific cells such as bone, skin, muscle, or nerve cells. The resource is currently operational as a data bank, but a bioinformatics tool that will allow users to track cell lineages via gene expression transcripts is still under development.

A continuous lineage is “critical" for stem cell biologists, according to Nelson, since they are trying to make a cell progress along a lineage, and without expression data “you can’t confirm you’ve gotten a cell to move toward a target cell type.”

The database, which is still under development, makes use of disparate sets of transcript profiles previously annotated but not before assembled into lineages. The team is accumulating this data from existing resources such as the Mouse Genome Informatics database and the Gene Names database hosted by the Human Genome Organization's Gene Nomenclature Committee.

Stem cell researchers never had much luck determining cell type by morphological traits in dishes, where stem cells are coaxed into specific cell fates with established protocols.

“Morphology is actually quite tricky,” said Jason Gibson, a fifth-year grad student working on the project. The relationships between observable cell traits and developmental cell states “don’t always hold true.”

And while RNA transcripts have been used to identify cell states, those profiles are usually annotated as discrete snippets of information for a single cell stage within an informative in vivo context.

Stem cell researchers, however, often work on cell technology on growth plates outside the context of the organism, meaning they lose out on a lot of cell type indicators.

Nelson said that progressive work on stem cells in vitro created a need for a more robust system of tracking transcripts, a system that could be validated on its own independently of a documented biological context.

“Genes expressed in an anatomical and morphological context are definitive" for identifying connections between cell states and expression profiles, Nelson said.

“In a dish you don’t have anything but gene expression,” he added.

Currently, the SCLD includes data on more than 5,000 mouse cell and tissue types and 19 mouse cell lineages and 98 human cell and tissue types and 10 human lineages. The lineage maps present lineage relationships between individual cell types, as well as gene expression profiles for cell type identification and information on stimuli that cause cells to transition from one stage to another.

Kelly Smith, a cell biologist at the University of Massachusetts Medical School and the curator of the Stem Cell Registry, a database launched in 2008 that has registered 1,000 human stem cell lines around the world with annotation and protocols for differentiating those lines, is familiar with the new database.

Smith said that his registry establishes a centralized database for the world to deposit operational information on proven cell lines, while UConn’s database provides “fine mapping of lineages through terminal differentiation” and offers a roadmap for “how to get from one place to another.”

As to the progress of annotating and tracking stem cells, “we’re really just beginning,” he said.

The UConn database is intended to be user-editable, with outside researchers able to contribute their own cell types, markers, and lineages.

Graduate students Edward Hemphill and Asav Dharia are working on bioinformatics for the project, including a clustering algorithm to select a minimal set of transcripts for each cell type, which the group calls Minimum Unique Marker Profiles, or MUMPs.

The team plans to first apply this tool to pick out transcripts to trace stem cells as they develop into blood cells, which they aim to demonstrate in the database as a proof of principle some time in the fall.

The researchers plan to release MUMPs as a freestanding tool on the SCLD site by the end of the year so that outside researchers can select transcripts for any cell lineage and see how well they trace it.

The database is so far sparsely populated — particularly for human data because the researchers have just begun to populate that section of the resource. The UConn team aims to expand its human expression data so that it approaches the quantity of mouse data.

Even as its database begins to take shape, the research team is confronting longstanding questions about the nature of stem cells and the process of cell differentiation.

Stem cells are curious in their nature since their expression of transcripts is stochastic, or random, holding no conforming profile; while progressively differentiated cells trend toward specific expression patterns. Researchers have yet to use expression markers alone to track a continuous cell lineage.

“Any marker might be expressed across a population, but there are some markers in some lines that would never be expressed in others,” Gibson said. As to the precise tipping point in the frequency of a transcript that can differentiate a specific cell state, “those thresholds haven’t been established,” he said.

In addition, the team is relying on the scientific literature and must contend with questions such as whether the total number of cells in a report is high enough to establish reliability of an expression profile for a certain cell type, as well as gaps in the literature that prevent a seamless tracking of cells.

“That’s exactly the problem we knew we had to face going into this,” Gibson said. “We knew we couldn’t get data for every cell for every day, but we were able to group the data into a hierarchy of levels.”

David Weisman, a fifth-year graduate student at the University of Massachusetts who has familiarity with the SCLD, said that its user-editable nature was conceptually intriguing.

“This notion of collaborative expertise is an emerging paradigm in the scientific community,” he said.

Future versions of the database will include better search capabilities, allowing users to search on more criteria and across species; the addition of gene regulatory networks controlling cell fate decisions; and the ability to incorporate other forms of experimental data, according to Nelson.

The researchers currently have a paper on the website under review.

The Scan

Genetic Testing Approach Explores Origins of Blastocyst Aneuploidy

Investigators in AJHG distinguish between aneuploidy events related to meiotic missegregation in haploid cells and those involving post-zygotic mitotic errors and mosaicism.

Study Looks at Parent Uncertainties After Children's Severe Combined Immunodeficiency Diagnoses

A qualitative study in EJHG looks at personal, practical, scientific, and existential uncertainties in parents as their children go through SCID diagnoses, treatment, and post-treatment stages.

Antimicrobial Resistance Study Highlights Key Protein Domains

By screening diverse versions of an outer membrane porin protein in Vibrio cholerae, researchers in PLOS Genetics flagged protein domain regions influencing antimicrobial resistance.

Latent HIV Found in White Blood Cells of Individuals on Long-Term Treatments

Researchers in Nature Microbiology find HIV genetic material in monocyte white blood cells and in macrophages that differentiated from them in individuals on HIV-suppressive treatment.