NEW YORK (GenomeWeb) – A team led by researchers at the University of Cambridge and Microsoft Research has combined gene expression profiles from single cells with computational approaches to build a model of transcription factor networks in early blood cell development.
The network modeling strategy, described in an article in Nature Biotechnology published online today, could be applied more widely to study the development of other organ systems, and will benefit from new technologies for gene expression analysis in single cells.
Berthold Göttgens, a professor of molecular hematology at the University of Cambridge, told GenomeWeb that his team has been interested in the development of blood cells from early progenitor cells for a long time. Blood, he explained, is not only associated with human disease, such as leukemia, but is also a good model system for tissue development because its cells are easily accessible. "The big scientific question is, how do cells make decisions during development to turn into specific types of cells?" he said.
In recent years, his group has taken advantage of new technologies to study developmental processes at the single-cell level. For the current study, the researchers looked at gene expression profiles of 46 genes – 33 transcription factors involved in blood cell development, nine marker genes, and four reference housekeeping genes – in a total of about 4,000 single cells with blood-forming potential, which they dissected from mouse embryos at four different time points during their early development.
For the experiment, they sorted individual cells into microtiter plates, where they lysed them and reverse transcribed their RNA into cDNA. They then loaded the cDNA into Fluidigm microfluidic chips, allowing them to perform Taqman RT-qPCR reactions for up to 48 genes in 48 samples in parallel. The entire experiment required approximately 90 chips and cost about £28,000 ($43,000) in reagents, Göttgens said.
To understand the relationships between the 4,000 cells the researchers turned to so-called diffusion maps, developed by collaborators at the Helmholtz Center in Munich, which allowed them to reduce the dimensions of the data. In theory, Göttgens explained, one would need to compare all 46 genes for each pair of cells – a 46-dimensional problem that he said is too difficult. Dimensionality reduction approaches – the most established being principal component analysis – help to display the relationship between two cells in only two or three dimensions.
Diffusion maps, which he said had not been applied to biological problems before, try to categorize the cells into connected groups, similar to a map of molecules undergoing diffusion. The maps did a better job at grouping the cells than existing dimensionality reduction tools, he said, and are "a valuable addition to the field that people should be aware of."
Collaborating with Jasmin Fisher at Microsoft Research Cambridge, the researchers then used the single-cell gene expression profiles to build an executable computer model of the transcriptional regulation networks.
Different genes are expressed during different stages of development, and "our goal was to identify the rules that underlie the switching on and off of genes," Göttgens said. This would be similar in principle to figuring out the rules of chess from photographs taken after each move that are presented in the wrong order. By looking at closely related photos, "you could stitch them back together to get the sequence of the whole game" and deduce the rules of the game from that, he said.
In a similar way, he and his team were able to place cells that differed by the expression in just one gene close to each other, resulting in a map of cell states. They then created networks by trying out different rules for each gene that would need to be consistent across the entire map. Overall, they obtained high-quality rules for 20 of the 46 genes and were able to build a computer model for those genes "where there is real mathematical logic that describes the relationship between the individual genes," Göttgens said.
Because the model is executable, it allows the researchers to make predictions on what happens to cell development, based on what genes are expressed at the start of the process. It also enabled them to do in silico experiments, where they simulate the knockout or overexpression of a gene and have the model predict in what state the cells will stabilize in the end. They then performed the same perturbation of a single gene experimentally and were able to confirm the computer simulation, he said.
While no model is a perfect simulation of reality, "we're proving that it's useful because it can be used to simulate experiments within seconds to pinpoint promising interventions for actual experimentation," he said.
To his knowledge, this is the first sophisticated approach to building models based on single-cell expression data. While others have taken simple measures of correlation to predict relationships between genes, his team's model "has these explicit rules which have a much deeper mechanistic meaning."
Building the model involved computational approaches originally developed by Microsoft Research for synthesizing computer code. "If they have specifications what a piece of code needs to be doing, then they can synthesize the code to do this by testing lots of different options," Göttgens said. This is similar to his team testing many different rules until they find those that satisfy their system. "The correlation-type analysis would be inferring things downwards, but this is building it up, and what's why it's called network synthesis," he said.
Going forward, researchers will be able to generate, and analyze, even larger single-cell datasets. Göttgens said he expects advances in single-cell RNA-seq technologies to allow researchers to profile gene expression in thousands of genes per cell rather than just a few dozen genes. Ultimately, researchers would like to combine single-cell gene expression data with DNA methylation data, he added.
The results could have implications for leukemia research and drug development. Mutations in leukemia cells frequently affect transcription factor networks, either by targeting transcription factors directly or their partner proteins, Göttgens said. "If we have a model of normal blood development, then we can simulate leukemia mutations within this model to get some mechanistic ideas of what the consequence of leukemia-causing mutations is," he said. This could provide clues to what pathways a drug could target to revert the behavior of leukemia cells back to normal cells.