Skip to main content
Premium Trial:

Request an Annual Quote

Harvard Team Combines SNP-Seq, FREP to Identify Non-Coding Functional SNPs

NEW YORK (GenomeWeb) – A Harvard Medical School-led team has developed an approach to tease out which SNPs uncovered in genome-wide association studies might be functional.

While GWAS have uncovered scores of SNPs associated with disease, determining which may be functional has been difficult. As they described in Nature Genetics today, researchers led by Harvard's Peter Nigrovic combined SNP-sequencing (SNP-seq) with flanking restriction-enhanced pulldown (FREP) to identify SNPs bound by regulatory proteins. They applied this approach to find three SNPs that affect the regulation of the CD40 locus and to identify nearly 150 candidate functional SNPs for juvenile idiopathic arthritis.

"Together, these findings establish the utility of tandem SNP-seq/FREP to bridge the gap between GWAS and disease mechanism," Nigrovic and his colleagues wrote in their paper.

SNP-seq relies on type IIS restriction enzymes, such as BpmI, that can be directed to bind certain SNPs and cut at a set distance from the binding site. But if a regulatory protein has bound that SNP, it is protected from cleaving.

By incubating a library of these SNP-restriction enzyme constructs with the nuclear extract of disease-related cells — which contains relevant regulatory proteins — researchers can then determine which SNPs are bound by regulatory proteins and which are not. Those that are bound can be amplified by PCR for sequencing, while those that are not are cut and cannot be amplified. That way, potentially functional SNPs may be identified.

Then, FREP can be used to determine which proteins bind the candidate functional SNPs (fSNPs) by using the fSNPs as baits to pull down the regulatory protein.

To test this approach on the CD40 locus, which is associated with rheumatoid arthritis, multiple sclerosis, and lupus, the researchers engineered risk alleles from 11 SNPs that are in linkage disequilibrium into separate SNP-seq constructs, which they then incubated with nuclear extract from human BL2 B cells. After restriction enzyme digestion and PCR amplification, they uncovered three candidate SNPs.

The researchers validated those three SNPs as noncoding fSNPs for CD40 through a luciferase reporter assay, an electrophoretic mobility shift assay (EMSA), and CRISPR-Cas9 perturbation of the sequences in BL2 cells.

When they compared their SNP-seq findings to an in silico analysis relying on the web-based tool HaploReg, they found that SNP-seq could identify fSNPs not predicted by that tool. In addition, they noted that SNP-seq could detect most —though not all — candidate noncoding fSNPs determined by EMSA.

After FREP pulldown, the researchers identified four proteins that appeared to bind these three SNPs and affect their regulation.

After showing that their approach could capture potential fSNPs, the researchers applied it to screen 608 SNPs in linkage disequilibrium with 27 loci associated with juvenile idiopathic arthritis, a condition that affects about 1 in 1,000 children in the US. After curation, they identified 148 candidate fSNPs for 25 of these JIA loci.

Four of these candidate fSNPs were in the STAT4 locus, which the researchers noted has been implicated in rheumatoid arthritic and type I diabetes. Using additional luciferase reporters, EMSA, and CRISPR-based assays, they homed in on two SNPs where mutations reduced STAT4 expression. Using FREP, they pulled down the H1.2 and SATB proteins, suggesting that those SNPs influence STAT4 expression via those regulatory proteins.

Overall, the researchers estimated that their approach has a true-positive rate of about 76 percent and a false-negative rate between 14 percent and 24 percent.

The approach has, the researches noted, a number of limitations. Most predominately, they noted that it only identifies fSNPs if the related regulatory proteins are in the nuclear extract used.

Still, they argued that their SNP-seq and FREP approach, when combined with other methods, "could accelerate the understanding of human disease pathogenesis through population-level genetics."