NEW YORK (GenomeWeb) – Short tandem repeats, one of the most abundant classes of repeats, contribute to gene expression variation in humans, according to researchers from the New York Genome Center and elsewhere.
There are some 700,000 STR loci in the human genome, but much of this variation has been thought to be neutral. Now, through a genome-wide survey, NYGC's Yaniv Erlich and his colleagues uncovered more than 2,000 STRs linked to expression changes, as they reported in Nature Genetics today. They further estimated that such eSTRs contribute between 10 percent and 15 percent of the cis heritability mediated by all common variants, and found that eSTRs are enriched in a number of clinical conditions.
"Our work expands the repertoire of functional genetic elements," Erlich, who is also an assistant professor at Columbia University, said in a statement. "We expect our findings will lead to a better understanding of disease mechanisms and perhaps eventually help to identify new drug targets."
Erlich and colleagues drew upon a set of 311 Europeans whose lymphoblastoid cell line expression profiles had been analyzed as part of the gEUVADIS project and whose whole genomes had been sequenced by the 1000 Genomes Project.
Using an approach called lobSTR they previously developed to profile STRs from next-gen sequencing data, Erlich and his team created a catalog of STR variation in this cohort based on the 1000 Genomes Project data. They then regressed that gene expression data on STR dosage to search for eSTR associations, uncovering some 2,060 unique protein-coding genes with a significant eSTR.
Most of the eSTRs they uncovered were di- or tetra-nucleotide repeats, and while about a dozen of these eSTRs fell in coding exons, much of the remainder were enriched in 5'UTRs, 3'UTRs, and regions right near genes, as compared to other STRs.
They further found that STR variation could account for a good chunk of variation in gene expression. By partitioning the relative contributions of eSTRs versus those of common biallelic SNPs, indels, and structural variations in the cis region of each gene using a linear mixed model approach, the researchers reported that eSTRs contribute about 12 percent of the genetic variance that's attributed to common cis polymorphisms.
In addition, after performing a traditional eQTL analysis based on the whole genomes from this cohort, the researchers found 4,290 genes with an eSTR within 100 kilobases. They then re-analyzed the eSTR association by conditioning it upon the genotype of the most significant eSNP for each gene. From this, they found hundreds of eSTRs influence gene expression beyond what's attributable to the lead eSNP.
eSTRs also appear to have functional roles, Erlich and his colleagues reported. For instance, as compared to random STRs, eSTRs are more likely to be found in regions that have undergone purifying selection, and they are enriched near transcription start sites and in peaks for histone modifications associated with regulatory regions. At the same time, they are depleted in repressed regions.
The researchers also noted that variations in eSTR length seemed to modulate the presence of certain histone marks.
To examine whether eSTRs might influence various human conditions, Erlich and his colleagues sifted through the National Human Genome Research Institute GWAS catalog to see whether those genes were enriched for eSTR-associated genes. They in particular focused on seven complex disorders — rheumatoid arthritis, Crohn's disease, type 1 diabetes, type 2 diabetes, blood pressure, bipolar disorder, and coronary artery disease — and found that the GWAS genes for Crohn's disease were significantly enriched for eSTRs, while those linked to rheumatoid arthritis exhibited moderate enrichment.
In a separate Twins UK cohort, the research uncovered 12 significant associations between eSTRs and various clinical phenotypes. Only one of these associations, they noted, overlapped with a known GWAS hit. The other 11 were involved in blood metabolite changes and physical trait.
All together, Erlich and his colleagues said this suggests eSTRs are enriched in clinical phenotypes.
"We've known that STRs are known to play a role in these diseases, but no one has ever conducted a genome-wide scan to find their effect on complex traits," Erlich added. "If we want to do personalized medicine, we really need to understand every part of the genome, including repeat elements — there's a lot of exciting biology ahead."