Skip to main content
Premium Trial:

Request an Annual Quote

New Methods Enable Quantitative Sequencing of Formyl-C, Study of Biological Role


Scientists in the UK have developed a new method for reading a recently discovered cytosine modification, 5-formylcytosine, at single-base resolution.

The approach, called reduced bisulfite sequencing, or redBS-seq, provides an alternative to another method, fCAB-seq, that was published by researchers in the US last year.

Together, the methods enhance researchers' toolkit for studying epigenetic modifications, which also include cytosine methylation and hydroxymethylation, allowing them to elucidate their potential biological role in mammalian cells.

"I believe that formyl-C is important and it will start to feature in a number of biological contexts in work that will roll out over the next couple of years," said Shankar Balasubramanian, whose group developed redBS-Seq and published it in Nature Chemistry last week.

Balasubramanian, a professor of medicinal chemistry at the University of Cambridge, and his colleagues also invented the Solexa sequencing method that lies at the heart of Illumina's sequencing platforms.

Two years ago, his group published a method called oxidative bisulfite sequencing, or oxBS-seq, for quantifying and distinguishing 5-methylcytosine (5mC), and 5-hydroxymethylcytosine (5hmC) at single-base resolution.

Cambridge Epigenetix, a University of Cambridge spinout co-founded by Balasubramanian, offers oxBS-seq as a commercial kit and plans to commercialize redBS-Seq as well.

Shortly after 5hmC was discovered, 5-formyl-cytosine (5fC) was reported in the literature, "so we immediately started thinking about ways of sequencing that modification as well," Balasubramanian told In Sequence.

Along with 5-carboxyl-cytosine (5caC) both modifications are present at low levels in the genome and are widely considered to be intermediates en route from methylated to unmethylated cytosine. But they may be more than just intermediates, serving as epigenetic signals. "I've always questioned why it is these modifications exist and persist," Balasubramanian said.

"There's been a tendency to [say] 'these rarer modifications can't possibly be important because they're present at such low levels,'" he said, but some appear to occur at relatively high levels in specific areas of the genome and might have biological functions.

"They cannot be ignored and we should not assume anything about them," Balasubramanian said. "Thankfully, now there are good measuring tools to detect them and decode them at single-base resolution."

For 5hmC, studies over the last year or so have already shown its importance in biology and a number of disease areas. "Formyl-C has come a little later, and I think the early indications are that it could be a distinct mark in DNA from hydroxymethyl-C and methyl-C, and one that is differentially recognized by naturally occurring proteins," he said.

To map 5fC, Balasubramanian and his colleagues performed bisulfite sequencing on natural DNA and on DNA where 5fC had been chemically reduced to 5hmC. By comparing results from the two experiments, they could deduce the positions of 5fC.

Under bisulfite treatment, 5fC gets converted to uracil, so it appears as a "T" in the sequence. 5fC that has been reduced by borohydride to 5hmC, on the other hand, gets converted to CMS, so it is being read as "C".

The researchers tested redBS-seq using Illumina sequencers, but the method is also compatible with other sequencing platforms, Balasubramanian said.

While it is "robust in our hands," he said, the sample prep workflow could be further optimized in terms of ease, efficiency, using smaller quantities of DNA, and enabling different sequencing formats.

According to Chuan He, a professor of chemistry at the University of Chicago, redBS-seq is "a very nice addition to tools available or under development to sequence 5fC." He said his group tried the same approach a while ago but found the reduction to be not efficient enough for obtaining single-base resolution. "Shankar and co-workers have done a very nice work optimizing the reduction approach to make it very efficient," he told IS.

Instead, He's team developed a similar method "that works quite well for us," called chemical modification-assisted bisulfite sequencing for formyl-C, or fCAB-seq, which they published a year ago in Cell.

The idea of redBS-seq and fCAB-seq is similar − reducing 5fC to 5hmC prior to sequencing − but He and his colleagues used EtONH2 instead of borohydride, which they found to result in lower background, according to Peng Jin, a professor of human genetics at Emory University School of Medicine and the co-senior author of last year's study.

In addition to fCAB-seq, He's team also developed an alternative approach for 5hmC sequencing, called TAB-seq for Tet-assisted bisulfite sequencing, which they published two years ago.

In last week's paper, the Cambridge researchers used redBS-seq, along with oxBS-seq and standard bisulfite sequencing, to generate a single-base-resolution map of three cytosine modifications − 5mC, 5hmC, and 5fC − in mouse embryonic stem cells.

Overall, methylation was present at "reasonably high levels" throughout the genome, while the level of hydroxymethlation was ten times lower and formyl-C was even less frequent, Balasubramanian said.

However, the scientists found regions of the genome where 5fC levels were higher than the other modifications, and other areas where 5hmC levels were highest. "If they're present at high levels, one should ask the question 'Why?'" he said.

The results also suggest that the three modifications are independent of each other, and "just because you measure one of them, you can't presume anything about the levels of the other ones," Balasubramanian added.

As for the functional role of 5fC, a low-resolution map previously published by his group showed that its presence in the promoter regions of genes correlates with markers of high expressions. "That's just a correlation, but if that is a correct observation that others also see, it could be related to transcription," he said.

Another study by his group and collaborators found that a number of proteins, including transcriptional and chromatin regulators, preferentially bind to DNA containing 5fC, suggesting a role in regulating gene expression.

One way this might work is by altering how DNA-binding proteins access the genetic code. "There is a hydrogen bonding pattern in the major groove [of the double helix] that is very important for what the world outside the double helix sees of the DNA code," Balasubramanian explained. "As soon as you put in a methyl group on a C, that alters the code in the major groove. Once you put in hydroxymethyl, that changes it again, and if you put in a formyl, that changes it again."

He added that "in molecular terms, these modifications dynamically alter the code that you see in the major groove of the double helix without changing the primary sequence."

According to the University of Chicago's He, "information about 5fC distribution can be critical to understanding active demethylation in various cell states or cell differentiation [or] development processes." His work and that of another group, he said, showed that 5fC is distributed genome-wide and accumulates at enhancers, "suggesting genome-wide active demethylation in mouse embryonic stem cells."

The Cambridge researchers are now applying redBS-seq in studies of DNA from normal, pre-cancerous, and cancerous tissues. Other studies have already shown that 5hmC tends to get lost when cells progress towards cancer, although not all types of cancer have been studied yet. The scientists are now looking for patterns of several cytosine modifications that might suggest a role for them in the onset of cancer.

There might be other base modifications in mammalian genomes out there that still await discovery. "I expect there are more," Balasubramanian said. "We now have the analytical tools to not only discover them and measure them, but also to start thinking about a different kind of whole genome sequence analysis, in which we not only get the canonical bases but also start to get high-resolution patterns of modified bases."

While single-molecule technologies such as nanopores or Pacific Biosciences' method might be able to deliver such a comprehensive genome map someday, "I have yet to see a report of a sequencing platform technology showing methylation, hydroxylmethylation, and formyl, all at single-base resolution," he said.