Skip to main content

NimbleGen’s Jeff Jeddeloh on Long Reads vs. Short Reads for DNA Methylation Analyses

Name: Jeff Jeddeloh
Age: 38
Position: Group Leader, epigenetic technologies, NimbleGen Systems
Experience and Education:
Director, science and technology, Orion Genomics, 2002- 2007;
Principal investigator, biological defense research, USAMRIID, US Army, Fort Detrick, Md., 1998-2002;
PhD in molecular genetics, Washington University in St. Louis, 1992-1998;
BA in biology, Washington University in St. Louis, 1988-1992.

Jeff Jeddeloh has been studying DNA methylation for more than ten years, using a variety of methods. He recently joined Roche’s NimbleGen Systems as group leader of epigenetic technologies after spending five years at Orion Genomics as director of science and technology.
Two weeks ago, Jeddeloh and his colleagues published a study in Genome Research, in which they characterized breast cancer-associated methylation patterns in tissue and serum DNA, using 454’s sequencing technology.
In Sequence spoke with Jeddeloh last week about his paper and the choice of different next-generation sequencing technologies for different types of methylation studies.
Give me some background on your breast cancer methylation study. How did the study come about?
The study came about because Orion was keenly interested in developing DNA methylation assays as diagnostic products. We have another publication just about to come out in PLoS One where we detail the discovery of a large number of new breast cancer biomarkers, specifically the loci that we tested in this paper.
For a good number of years, the literature has suggested that tumors may be remotely sensed through the monitoring of methylated molecules circulating in the serum. There is a technology that most investigators use to detect those tumor-shed molecules, called methylation-specific PCR, or MSP. The thing about MSP is, in order to get a PCR product, it requires specific Cs to be methylated. You don’t get data from all of the cytosines, you just get data from some that, at the beginning of the experiment, you decide are important.
We thought it might be more fruitful to use ultra-deep bisulfite sequencing to be able to precisely characterize which cytosines are differentially methylated within a region. That would allow us to drill down to [a] higher specificity test. The idea was to let the samples tell us which bases were more important, so that we could design the most specific test.
How did you discover these breast cancer-related differentially methylated loci in the first place?
Orion has a novel technology for identifying differently methylated loci in a genome, and we utilized this approach in a microarray-based analysis. Looking at DNA methylation, we discovered that there is an epigenetically common pathway to breast cancer. The top four biomarkers are differentially methylated almost exclusively in association with disease, and almost never in the normal tissue.
The way the Orion technology works for identifying differential methylation, it doesn’t care about any particular CG, it identifies regions of the genome that are differentially methylated. We thought that ultra-deep sequencing might be the best way to give us the most comprehensive view of how the Cs within that region change in association with disease.
Our bisulfite sequencing study had two phases: a tissue phase, and a serum phase. We knew from tumor tissues that these regions would be differentially methylated, and the idea was to study tumor molecules and figure out how they are differentially methylated, and then look in serum of both breast cancer patients and cancer-free patients. We wanted to see if you can identify some tumor-associated methylated molecules that are circulating in the cancer patient serum, but not in the normal serum. If you could identify those molecules, then you could build a specific test for a small subset of them, and then you would have a fantastic test for detecting breast cancer, because it would just be a blood-based test.
But to our surprise, when we did the study, what we found was, in fact, that normal patients have a surprisingly large amount of methylated DNA that’s circulating. The background in normal patients is actually very, very high, around 1 percent or so. With a background that high, it would be virtually impossible to detect one or two extra tumor molecules on top of that. That really was the take-home message of that study.
We say in the discussion that believing that tumors shed their methylated molecules in an amount that’s easy to detect early in disease is probably not realistic. If you really want to find differentially methylated regions in the genome, and build diagnostics to them in serum, the best thing to do is actually screen for those loci in serum directly. Comparing cancer-case serum to cancer-free serum, and looking at the whole genome, you will figure out where the background is lowest, and that’s where you would build your test.
Why did you choose 454’s sequencing platform for this?
There are two reasons: First, we really needed something to go ultra-deep. That means next-generation technology, because doing Sanger sequencing would have been prohibitively expensive. So now our choices, at the time that we executed the study about a year ago, were either Solexa or 454, because ABI’s technology wasn’t available.
We got access to the 454 technology through a collaboration with the Washington University Genome Center. They also had a Solexa machine available at that time. But when we looked at the Solexa read length, it really was too short to do the methylation analysis that we wanted to do.
By way of example, let’s say that the Solexa technology had an average read length of 32 bases, which is a generous number. The distribution of CpG dinucleotides in the human genome is such that you would only get approximately one CpG per read on average. If you are trying to build a molecular test, you want to have a read length that’s going to give you more than one CpG per read, because you want to know the methylation haplotype. The only way you can establish that for different CpGs is if you have longer reads, so there is more than one CpG in your read. Otherwise, you only get the average value everywhere, so you won’t know exactly what the molecule looks like. Instead, what you’ll know is what the average occupancy is.
Methylation is a two-dimensional problem: You need to know the occupancy per molecule, and you need to know the abundance of the occupancy at that site across the population. You are actually slicing your data in two dimensions: Across molecules, and within a molecule. And the problem is that Solexa data, and for that matter, data from ABI’s SOLiD platform, doesn’t have long enough reads to be able to reconstruct the molecular haplotype. 454 offered long reads, several hundred of bases. Right at the time of our study they came out with the FLX, and our runs might have been some of the first runs on that platform, at least for bisulfite sequencing.
What role will next-gen sequencing play in other DNA methylation studies?
We are at this stage right now where people are just starting to answer that question.  ‘Now I have that tool, next-generation sequencing — how can I use it to study DNA methylation?’ That’s the future. You have seen some snapshots on that. You have seen [the Whitehead Institute’s] Alex [Meissner]’s experiments, which I think are very, very interesting (see In Sequence 11/20/2007).
But there are a lot of other approaches, for example one by Tim Bestor at Columbia University [see Q&A with John Edwards, In Sequence 10/16/2007]. He has an approach that’s just like the Orion approach, only they are using a SOLiD sequencer instead of a microarray for readout. They have a counting application. I told you that the Orion technology doesn’t care which Cs are methylated. The enzyme they use to interrogate the genome is methylation-dependent. That means it cuts in areas of the genome where there is methylation. It’s really nice to be able to use a counting sequencing application, because you can compare a genome that’s treated and a genome that’s not treated with the enzyme. If the locus is methylated, it will be underrepresented from your treated population, and then you can just count across the genome. You put molecules into bins, and the abundance of those fragments is a reflection of the amount of methylation at the locus.
Would it be fair to say that for methylation studies where you don’t need to know the exact methylation sites, or where you don’t need to see haplotypes, short-read technologies are more appropriate?
I think that’s very fair to say. The majority of the work that I have done in the past ten years has all demonstrated that which C is methylated is almost never important. We have to remember that methylation in and of itself doesn’t really mean anything. It’s a reflection of the packaging status of that sequence. DNA methylation patterns are established as a consequence of histone methylation patterns. Remember, it takes 165 base pairs of DNA to go around a histone. What that means is, whether there is one methyl group, or two, or five, or ten, 165 bases are going around the histone. Thinking about the epigenome with a single-base view is probably myopic.
What does that mean for longer-read technologies like the 454?
If you really care about things like imprinting, they become important. In imprinted loci, half of the molecules are unmethylated, and half of the molecules are fully methylated. With a short-read technology, they would all look like 50 percent. So you could certainly use the short-read technology to identify loci that have 50 percent methylation, but you would have to go back in with long-read technology and figure out which of the 50 percent loci represent methylation on every molecule that averages to 50, and which of the ones are more interesting, and potentially allelic methylation.
The only sure way to identify allelic methylation is with a long-read technology. But there is another reason long reads are advantageous. Bisulfite treatment is mutagenic, you turn all the unmethylated C’s to uracil, and then when you sequence those, they sequence as T’s. What you do is, you take a four-base-pair genome, like the human genome, and you turn it largely into a three-base pair genome. Short-read technologies, in a three-base pair genome, really suffer because as soon as you take something that used to have four bits of information and turn it into something that has three bits of information, your addressing gets even worse. Short-read technologies aren’t really ideal for bisulfite sequencing, because there is too little to map it back, which is part of the reason why Alex [Meissner] uses a reduced representation approach, because that complexity reduction allows him to efficiently map back. If you just tried to do the whole genome, short reads would never map. But in the bisulfite-free protocols, they don’t suffer that problem, because all bases can be used to map the sequence back. The other reason is that it’s a way to cover the space in just a few channels, that is, cheaper.
In addition, bisulfite, because it’s mutagenesis, destroys SNPs. And what’s the most frequent SNP in the human genome? It’s a C-to-T transition. What that means is, a huge number of the SNPs that can normally give you polymorphism information go away in bisulfite-converted DNA. So if you are using a bisulfite-free approach, it’s much more amenable to short-read technology, and, if you are using a sequencing output, you can get all of the SNP information. In a short-read approach, like the one that Tim Bestor is using, for instance, when they sequence the methyl-depleted genome, and they only recover one allele and not the other allele, now they know which allele is specifically methylated. That’s how you discover imprinted loci in a bisulfite-free universe, you look for missing allelic representation.
So Orion’s interest in methylation is in developing biomarkers for cancer. What’s NimbleGen’s interest in this?
My role in this project at Orion wasn’t so much from the cancer biology perspective, but it was really from the technology development application, and then the employment of the tool to a fundamental problem. Now that I am no longer working at Orion, I am more heavily focused on the technology development angle. But I can’t really tell me much more than that.

The Scan

Response Too Slow, Cautious

A new report criticizes the global response to the threat of the COVID-19 pandemic, Nature News reports.

Pushed a Bit Later

Novavax has pushed back its timeline for filing for authorization for its SARS-CoV-2 vaccine, according to Bloomberg.

AMA Announces Anti-Racism Effort

The Associated Press reports that the American Medical Association has released a plan to address systemic racism in healthcare.

Nucleic Acids Research Papers on miRMaster 2.0, MutationTaster2021, LipidSuite

In Nucleic Acids Research this week: tool to examine small non-coding RNAs, approach to predict ramifications of DNA variants, and more.