Peter 't Hoen
Researcher, Center for Human and Clinical Genetics
Leiden University Medical Center
Name: Peter ‘t Hoen
Title: Researcher, Center for Human and Clinical Genetics, Leiden University Medical Center, the Netherlands
Background: 2002-present, post-doctoral researcher, Center for Human and Clinical Genetics, Leiden University Medical Center, the Netherlands; 2002 — PhD, natural sciences, Leiden University; 1997 — MSc, biochemistry, pharmacochemistry, Vrije Universiteit, Amsterdam, the Netherlands.
The advent of digital gene expression applications performed on second-generation sequencers has raised questions about the role microarrays will play in the gene expression market in the future.
The Center for Human and Clinical Genetics at Leiden University Medical Center in the Netherlands recently compared expression data from five different array platforms — Affymetrix, Illumina, Applied Biosystems (now Life Technologies), Roche NimbleGen, and an in-house spotted array — with expression data from Illumina’s Genome Analyzer DNA sequencer to determine the best platform for new projects.
Specifically, the group looked at the ability of the various platforms to identify transcripts differentially expressed in the hippocampus between wild-type and transgenic mice that over-express a splice variant of the doublecortin-like kinase-1 (Dclk1) gene.
According to the authors, the results of their study, published in this month’s issue of Nucleic Acids Research, demonstrate “many advantages of DGE over expression microarray technology,” including an “unbiased view” of the transcriptome,” the ability to detect “high levels of differential polyadenylation and antisense transcription,” and high “inter-laboratory comparability of DGE data.”
BioArray News spoke with lead author Peter ‘t Hoen, a post-doctoral researcher at the center, last week to learn more about the comparison and its implications for expression array technology. The following is an edited transcript of that interview:
What is your primary research area of interest?
My main research interest is to apply all kinds of new, high-throughput technologies, mainly in the area of RNA expression profiling and proteomics, in the field of muscular disorders. I am developing the technologies in the direction of diagnostics, but also mechanistic studies, and studies into probable new therapies for muscular disorders.
My main expertise is the analysis of high-throughput data sets, and also the integrated analysis of these data sets.
To what extent have you used microarrays in your research in the past?
I have used array technology a lot, actually. I started here at the LUMC in 2002. Then, we were in the phase of optimizing our spotted array technology. We have gone through all the phases of the technology, to its maturation. Also, we have done very many projects on commercial platforms. We are aided by the presence of the Leiden Genome Technology Center here within the department. This is the genomics center that provides technology to researchers and clinicians in the hospital, but also researchers outside.
I noticed in the paper that you have access to quite a number of platforms. Not everyone has the opportunity to run their projects on Affymetrix, Illumina, Agilent, and ABI.
The philosophy of the Leiden Genome Technology Center is to have a broad portfolio so that customers have a choice in what platform to use. It can only be done because they act as an open facility to external researchers.
In terms of the recent paper, why did you decide to use an Illumina Genome Analyzer over other second-generation sequencing systems?
At that time [in 2007], there was basically a choice between two — [Roche] 454 [Life Sciences] and Illumina. Well, the number of reads you get from 454 is just insufficient to do any RNA-based expression profiling. You need at least millions of sequences to be able to really say something quantitative, in particular, for the low-abundance transcripts.
So, there was not much of a choice back in 2007 when we acquired the machine. Currently, Applied Biosystems’ SOLiD system is another good alternative.
Can you describe your experience with the DGE application? Was it easy to use?
Actually, it was a very pleasant surprise, because the experiment we described in this paper was really the first experiment we did on this machine. So, it worked right from the beginning. We had no problems setting up the technology. Of course, we got training, but, after we were trained, everything went smoothly.
We have had some troubles during the past year. For example, the machine broke down a few times. These types of issues were caused, I think, because the first version of the machine, [the Genome Analyzer I], was put together with different, individual components as fast as possible to bring it on the market. Now, you see that the second machine, [the Genome Analyzer II], is actually more robust.
So, there was really no problem with the technique as such, and, for a new technology, that is in our experience quite an exception. Normally, you have to invest a lot of time to get things going. We know from microarrays that it took us years to get robust results.
What kinds of challenges were posed by analyzing the data?
In terms of data analysis tools, there wasn’t really much available. What came with the machine are just some very basic quality-control and data-extraction tools. For downstream analysis, we have to develop everything ourselves. Also, at the moment, data analysis lags far behind the experiments, I would say. It is very easy to generate a lot of data; it is far more difficult to really analyze this data in depth.
That is why people are [still] a bit hesitant to step over [to DGE]. I think we and others have demonstrated the great opportunities that we all have with such a machine, but we have to be able to analyze the data properly. We are very fortunate to have some great bioinformaticians here in the department, who are really working hard at constructing pipelines for data analysis.
There are several centers elsewhere in the world where that is being done, but for those people who do not have access to these kinds of resources, applying the technique is still a problem. For microarrays, there is a lot of data analysis software available, and some of it is quite user friendly. For sequencing, that is definitely not the case.
Is cost also a factor?
In our facility, they charge about two to three times as much for a sequencing experiment as they do for a microarray experiment. Since the quality of the DGE data is better, though, you tend to need less technical replicates, for example. The increased quality can save some money in some cases. You also get a lot more value for your money, I would say. You see a lot more. We saw the anti-sense transcription, for example, which you will not see on general microarrays. You get all of that extra, so it is really worth the money.
Prices will decrease. They are working right now on multiplexing of these assays, so that you can give all the samples a barcode and sort out which sequence read belongs to which sample, and so you can save even more money.
You compared wild-type and transgenic mouse hippocampus expression in this evaluation. Why did you choose this as a biological question?
This was chosen because we had already done the comparison between microarray platforms with these samples. It was the easiest thing to include this comparison with sequencing. Also, we chose this comparison because it is a biologically relevant comparison.
What we have seen so far in comparisons, is that people have compared really different samples, like brain and heart, and looked at the differences, which are huge — you can’t even compare a brain sample with a heart sample, I would say. We decided to go for this particular setting, which is a normal biological setting in which people would be interested.
We knew that this would be quite challenging. That is mainly because in the hippocampus, you normally have very subtle differences. These differences in gene expression are only present in a subset of cells. The differences we found on the microarrays were very small, and made it very challenging to pick up any differences. We thought sequencing could do better.
In what ways did you find DGE to perform better than array-based gene expression?
Of course, we expected to have more coverage and that is actually what we observed. You could measure very low-abundance genes, and you could measure transcripts that could not be interrogated on microarray platforms because there are simply no probes for these transcripts on those platforms. So, that was one advantage that we knew of beforehand and that came out very nicely.
The pleasant surprise was that we did the experiment in two different labs — in our lab and in the lab at the Illumina site in Hayward, Calif. The results were very reproducible, although we did everything separately. That is actually something that is a big problem for microarrays. It is very difficult to control conditions in the different laboratories so that they can generate similar results.
We also had the issue of sensitivity. What you find on microarrays is often ratio compression that is mainly due to the background component in the signal. Here, you basically have no background. You could count something or you don’t count it. That has caused a real increase in the sensitivity. The ratios between the two groups of samples were much bigger than on the microarrays, and probably more realistic as well.
In your comparison, you also noted that Affy data was more similar to DGE data than, say, Illumina microarray data, even though Illumina sells both array and sequencing platforms. Can you explain this?
The difference between Affymetrix and Illumina is that Affy uses more probes per gene. So, I think when you talk about consistency then you are going to compare two results that are not really comparable. The things we see with sequencing are very specific for certain transcripts. It is not automatically true that you see the same transcripts with your probes on a microarray. I think that by putting more probes per gene on the array you are more likely to average out these probe-specific effects. So, that could be one of the reasons.
Of course, you adjust every platform to perform in an optimal way, but you are never entirely sure about it. What we have seen is that the data, in terms of differentially expressed genes and in terms of signal intensities, says that the Affy platform was more comparable to sequencing.
Based on this assessment, will you continue to use arrays in your research or will you move more of your projects to DGE?
It depends a bit on the project, but for new projects I will never use any expression microarrays again, I think. It is kind of hard to compare microarray data to sequencing data. The nature of the data is completely different. When you really want to continue an existing study, I think it is difficult to step over to the other technique, so in running experiments we tend to use the same array platforms that we used before. For new projects, we are using sequencing.
I don’t think that microarrays will completely disappear, because, of course, with microarrays you can answer a more focused question. You could interrogate a specific subset of genes. You are not always interested in the complete transcriptome. There may be a future there, but for new projects I would go for sequencing, definitely.