CHICAGO – A first-of-its-kind atlas of "ramp sequences" near the 5' end of highly expressed genes promises to help researchers better understand gene expression and disease development.
Bioinformaticians and computational biologists at the University of Kentucky and Brigham Young University recently released the Ramp Atlas, a compendium of 18,388 tissue- and cell type-specific ramp sequences covering 62 tissues and 66 cell types. The resource, published in a paper in NAR Genomics and Bioinformatics in May, also features interactive comparisons for SARS-CoV-2 ramp sequences.
Codon usage depends on the local availability of tRNAs and RNA binding proteins, which can differ between tissues and cell types. As a result, the efficiency at which codons are translated can vary, even though the underlying mRNA sequence remains the same.
A "ramp" of codons that are slowly translated may appear at the beginning of highly expressed genes in order to space out ribosomes translating those genes. This helps avoid downstream ribosome collisions that can cause ribosome-associated protein quality control-mediated decay. Such a ramp has the effect of increasing protein levels.
Justin Miller, director of pathology bioinformatics at UK and one of three co-first authors of the paper, said that the publication of the Ramp Atlas represents the culmination of several years of work, which included the development of a web portal called CUBAP — for Codon Usage Bias Across Populations — that looks at how natural differences in genes within various populations can affect ramp sequences. "From there, we hypothesized that ramp sequences would also change between different tissues and cells, which then led to the creation of the Ramp Atlas," Miller said in an email.
The initial iteration of the Ramp Atlas has data on 3,108 genes with ramp sequences that vary between tissues and cell types, even though the underlying genetic code does not change, leading to differences in expression.
Miller said that ramp sequences "indicate the part of the gene sequence at the 5' end that is an outlier region of decreased codon efficiency relative to the rest of the gene." He earned a Ph.D. in biology and informatics at BYU and later was a postdoctoral researcher there before moving to Kentucky.
Miller has been working on ramp sequences with BYU computational biologist Perry Ridge — another author of the NAR Genomics and Bioinformatics paper — since reading what he called the "seminal work" on the subject, a 2010 article in Cell. He said that he and Ridge were "fascinated by how synonymous codon usages affect protein and transcript levels."
In their recent paper, Miller and his colleagues wrote that ramp sequences, which are present in about 10 percent of human genes, occur when about 20 to 40 of "suboptimal" codons are concentrated at the 5' end of highly expressed genetic sequences. "Ramp sequences increase overall translational efficiency by utilizing slowly translated codons at the beginning of genes," they said.
They explained that ramp sequences can be calculated by identifying statistical outliers of codon efficiencies at the 5' end of gene coding sequences. The researchers ran these calculations on homegrown software called ExtRamp that detects ramp sequences within genetic sequences.
A team led by Miller described the ExtRamp algorithm in a 2019 paper in Nucleic Acids Research. The group has since created an online version of ExtRamp that is accessible to researchers without bioinformatics skills.
In the new article, they called the Ramp Atlas a "template for conducting ramp sequence analyses on viruses and identifying ramp sequences that are correlated with tissue- or cell type-specific differential gene expression." They said this is the first time that ramp sequences have been described via single-cell codon efficiencies across human tissues and cell types.
"Because tissues and cell types have distinct [transfer RNA] levels and codon usage biases, we hypothesized that they would also have distinct ramp sequences, despite having no differences in the underlying genetic code," they wrote. They believe that this atlas will improve understanding of ramp sequences in terms of tissue- and cell-type gene expression and, ultimately, in predicting human health and disease.
"Since tRNA concentrations change between cells, the Ramp Atlas allows researchers to see how those changes affect ramp sequences and gene expression … [and] allows researchers to see if a ramp sequence occurs in any human or COVID-19 gene in any tissue or cell," Miller told GenomeWeb.
"We anticipate that the Ramp Atlas will facilitate future ramp sequence analyses on tissue or cell type-specific gene expression impacted by ramp sequences, genetic variant effects on tissue-specific gene expression, viral adaptations to specific tissues or cell types, and therapeutic developments that aim to modulate tissue-specific gene expression," the authors wrote.
The Ramp Atlas came from all available tissue and cell expression data in the Human Protein Atlas, Genotype-Tissue Expression (GTEx), and FANTOM5 datasets.
Using Tableau visualization software, the Ramp Atlas adds an extra column to the Human Protein Atlas that indicates whether or not there is a ramp for each gene. Users can compare this additional information to normalized gene expression for each tissue type to determine whether the presence or absence of a ramp sequence modified expression levels.
The atlas presents data on tissue and cell type-specific ramp sequences side by side with gene expression data, making it easy for investigators to study how ramp sequences affect gene expression.
The ExtRamp algorithm calculates ramp sequences by comparing tRNA abundances and codon usage biases. The system defaults to the GRCh38 human reference genome, though researchers can upload any reference genome to support the calculation. Miller said that the Ramp Atlas team is now exploring how CHM13 and other reference genomes could affect ramp sequences.
With the Ramp Atlas, the BYU-Kentucky team identified ramp sequences in seven SARS-CoV-2 genes and found that tissues with high levels of coronavirus proliferation had "significantly" more virus and human entry factor genes with ramp sequences than tissues with lower levels of proliferation. This was particularly evident in the rectum and duodenum, they said.
The search function of the atlas allows users to query ramp sequences in the SARS-CoV-2 genome and in seven human entry factors for the virus that causes COVID-19.
The researchers claimed that this is the first time anyone has identified ramp sequences in SARS-CoV-2 genomes. "We show that the tissues with ramp sequences significantly intersect with tissues known to have higher rates of viral infection and proliferation," they said.
"We anticipate that the Ramp Atlas will allow future studies to identify how specific ramp sequences within single transcripts contribute to single-cell gene expression," according to the paper. "We also anticipate a wider adaption of ramp sequences in viral and disease research that are facilitated online through the Ramp Atlas."
The researchers suggested that ramp sequences may have an outsize role in determining tissue-specific coronavirus infection and proliferation that warrants further research, particularly with the ACE2, TMPRSS2, and CTSL entry points.
"While both TMPRSS2 and CTSL contain ramp sequences, only ramp sequences in TMPRSS2 are tissue-specific and therefore may influence which tissues are most infected by SARS-CoV-2," they wrote.
The SARS-CoV-2 study is meant to serve as a framework for future online analyses of tissue- and cell type-specific ramp sequences, including for novel viruses.
In an email, the lead author of the 2010 Cell paper, Tamir Tuller of the Weizmann Institute of Science in Israel, called the Ramp Atlas "helpful to many researchers in the field." However, he said that it is unlikely that he and his team will use it because his lab already has "more sophisticated" algorithms for identifying ramp sequences.
Tuller said that the initial version of the Ramp Atlas does have some shortfalls that need to be addressed in subsequent releases. Notably, he said that ribosomal speed depends on multiple factors, including local mRNA folding, interactions between mRNA and ribosomal RNA, and amino acid sequence interaction with ribosomal exit tunnels.
"It is impossible to infer it based only on codon frequencies," he said.
Tuller would also like to see the atlas be able to infer different aspects of ramps separately, such as translation initiation and elongation.