NEW YORK — Researchers were able to glean tissue-specific gene expression from individuals' whole blood transcriptomes, according to a new study.
Knowing tissue-specific gene expression can provide insight into disease, but many disease-affected tissues are not easily accessible for sampling and analysis. Researchers from the US National Cancer Institute instead sought to determine whether tissue-specific expression can be gauged from more easily obtained blood samples.
As they reported on Friday in Science Advances, NCI's Sridhar Hannenhalli and his colleagues used Genotype-Tissue Expression (GTEx) project data to test models for predicting tissue-specific gene expression from whole blood samples. For more than 30 different tissue types, they found that one of their models could predict tissue-specific gene expression levels for about 60 percent of genes — and an even higher portion of genes in skeletal muscle. Such transcriptome markers could be used as prognostic disease markers, researchers said, and they have developed a software pipeline for predicting tissue-specific gene expression dubbed Tissue Expression Estimation using Blood Transcriptome, or TEEBoT.
"In principle, TEEBoT could be used in a variety of complex disorders where there are current ongoing efforts to build tissue-based expression biomarkers, including chronic aging-related diseases and cancer," the researchers said in an email.
They developed their model using GTEx data for 32 primary tissues for which there were samples from at least 65 individuals as well as whole blood transcriptome and demographic data. For each tissue and gene, they fit three nested regression models to predict tissue-specific gene expression data. The prime model — dubbed M2 — encompasses whole-blood gene expression and whole-blood splicing data as well as the demographic factors of age, race, and sex.
For 17,031 genes across the 32 tissues, the researchers fit their models and estimated their accuracy. For the base model M1, which included whole-blood gene expression and demographic factors, the researchers noted that it could predict a portion of expressed genes, but that the addition of whole-blood splicing data as in M2 improved predictions for more than 40 percent of genes. Another model that folded in SNP data was also tested, but the researchers found that it only improved predictions for a small number of genes while also requiring whole-genome sequencing.
On average, the M2 model made a significant contribution toward tissue-specific gene expression prediction for 59 percent of genes and up to 81 percent of genes in muscle-skeletal tissue.
The researchers also noted that tissue-specific predictable genes had certain characteristics. Namely, they tended to be involved in fundamental cellular processes and have greater connectivity with other genes, including housekeeping genes.
This model, the researchers found, is almost as good as actual measured gene expression levels from specific tissues in predicting certain disease states. For diseases annotated in GTEx for which there were sufficient samples, they compared the ability of actual tissue-specific gene expression, predicted tissue-specific gene expression, and whole-blood gene expression to predict disease states. Predicted tissue-specific gene expression performed better than and whole-blood gene expression and was comparable to actual tissue-specific gene expression, they found.
The researchers have additionally made the code for their TEEBoT pipeline publicly available. They are further working to improve it by folding in additional proteomic and metabolomic data and benchmarking it on newly available datasets. At the same time, they are exploring its applicability to predicting tumor transcriptomes from cancer patients' blood transcriptomes.