Name: Quaid Morris
Position: Assistant professor, cellular and biomedical research, University of Toronto
BACKGROUND:
Postdoc, University of Toronto — 2002-2005
PhD, computational neuroscience, Massachusetts Institute of Technology — 2002
Technical consultant, Inzigo — 1999-2000
BS, computer science, Trinity College at the University of Toronto — 1996
Research assistant, computer science, University of Toronto — 1995-1996
In this month’s Nature Methods, a research team led by the University of Toronto’s Quaid Morris described a new approach to using expression profiling data for the identification of human microRNA targets.
According to the paper, the investigators developed a Bayesian data-analysis algorithm that was able to identify a network of 1,597 high-confidence target predictions for 104 human miRNAs.
This week, RNAi News interviewed Morris via e-mail about his work.
Could you provide a little background on your lab and its areas of research?
We are a computational biology lab … at the University of Toronto, [and] our research focuses on the regulation of gene expression. We try to understand the relationship between cis-regulatory signals and gene expression by statistical modeling [and] collaborate extensively with molecular biologists like Tim Hughes, Ben Blencowe, Charlie Boone, Andrew Emili, and Howard Lipshitz.
When and how did microRNA become part of that research focus?
Jim Huang, the first author [on the Nature Methods paper], and I started working on microRNAs about three years ago when I was a postdoc and Jim was a new student in Brendan Frey's lab [at the University of Toronto].
MicroRNAs suppress the expression of their targets by one of two different mechanisms, by mRNA transcript degradation or by repressing translation. We wanted to try to tease apart these two mechanisms by comparing mRNA, microRNA, and protein expression data from matched tissue samples.
I had thought we were in a good position to do this research because I had been involved in studies profiling mRNA, miRNA, and protein expression in mouse tissues, so we had access to all the data we needed. I also understood the data very well, which is important when building statistical models.
Can you discuss the miRNA target identification approach described in the Nature Methods paper? What questions/problems were you looking to address when you developed the data-analysis procedure? How did you do so?
Identifying microRNA targets on a large scale is quite hard. The best computational predictions are based on finding a conserved seven- to eight-base-pair seed in the 3' UTR of an mRNA. To get reasonable accuracy for these predictions, you have to use fairly strict conservation requirements.
We were trying to improve the sequence-based predictions by incorporating evidence of miRNA-mediated degradation coming from the expression profiles of their putative mRNA targets. Basically, we were trying to separate the wheat from the chaff by using expression data.
The basic idea is very simple and has been around for a while, but the devil is in the details. If an miRNA acts by degrading its mRNA target, there should be an inverse relationship between the measured expression profiles of the miRNA and its target — wherever the miRNA expression is high, the expression of its target should be low.
The problem is that there are a lot of confounding factors like other miRNA regulators for the same target and differences in normalization of the microRNA and mRNA expression data. Our method, [termed] GenMiR++, scores miRNA and mRNA pairs based on the strength of this inverse relationship after taking into consideration these confounding factors.
GenMiR++ starts from an initial set of microRNA target predictions that come from comparing the mRNA and miRNA sequences and then assigns a confidence score to each of these predictions using the paired miRNA and mRNA expression data. In this study we used TargetScanS target predictions for our initial set but we could have used other sequence-based methods.
How did you test the procedure?
First, we used miRNA and mRNA expression profiling data from normal and cancerous human tissues published by Todd Golub's lab, along with the TargetScanS sequence-based predictions, to predict miRNA targets in human.
No neural tissue was represented in this set, which is important because we are testing the method using neural-derived tissues.
After scoring the set of sequence-based miRNA target predictions, GenMiR++ chooses a subset that is best supported by the expression data. Then we did experimental and computational validation by comparing GenMiR++ predictions to the original TargetScanS predictions
Tomas Babak, who was a student at the time in Ben Blencowe and Tim Hughes' labs [at the University of Toronto], did the experimental validation in collaboration with Tim Corson, who was a student in Brenda Gallie's lab [at the University of Toronto].
They profiled miRNA expression levels in human retinoblastoma and found that let-7b, which is normally expressed in human retinal cells, was depleted in retinoblastoma. They transfected let-7b back into retinoblastoma, [then] measured mRNA expression levels before and after the transfection.
Compared to TargetScanS predictions, GenMiR++-predicted targets were much more likely to have higher expression in retinoblastoma than in both healthy retina and let-7b transfected retinoblastoma.
Jim Huang and Tomas Babak did the computational validations. We reasoned that if GenMiR++ is doing a good job of identifying the real miRNA targets, then the sets of targets of individual miRNAs that GenMiR++ identifies should have more consistent functional annotations than the initial target sets.
The difference in enrichment of Gene Ontology functional annotations between the GenMiR++ and the TargetScanS target sets was overwhelming when we looked across all microRNAs. To make sure that we weren't just identifying sets of co-expressed mRNAs, we also compared the GenMiR++ sets to sets of mRNAs with similar expression patterns — the difference in enrichment was equally overwhelming.
Are there any limitations to the procedure, or areas that require additional fine-tuning?
One thing that we are working on is using GenMiR++ to predict miRNA targets that are regulated by translational repression.
A very similar principle holds [that] if an miRNA regulates a target gene by repressing its translation then there should be an inverse relationship between the miRNA and the gene's protein product, once the gene's mRNA levels are taken into consideration. It's much harder to get protein expression data, though.
Is the target-identification procedure available to the research community? If so, where?
We've published our target predictions for human on these sites. Also, the computationally savvy researchers can download [Huang’s] MATLAB code that they can run to make their own predictions if they have miRNA and mRNA data from matched samples.
What are the next steps for the lab after the publication of the paper? Will more work be conducted? Is the lab working on any other miRNA-related projects?
Besides the work I talked about above, we are working on incorporating other more recently identified features of miRNA target sites, like accessibility, into our model to improve our accuracy and our coverage.