Name: Rebeqa Gunnarsson
Title: PhD Student, Juliusson Group, Lund Strategic Research Center for Stem Cell Biology and Cell Therapy, Lund University, Sweden.
Researchers looking to survey copy number variations face a variety of different options when shopping around for platforms, and scientists typically use diverse approaches to look at alterations, from comparative genomic hybridization on bacterial artificial chromosomes or Agilent oligonucleotide arrays, to SNP-genotyping using Affymetrix or Illumina chips.
A group of researchers studying chronic lymphocytic lymphoma led by Lund University this month published a comparison in Genes, Chromosomes, and Cancer of several commercially available platforms for looking at CNVs. [Gunnarsson R, et al. Screening for copy-number alterations and loss of heterozygosity in chronic lymphocytic leukemia — a comparative study of four differently designed, high resolution microarray platforms. 2008 Aug;47(8):697-711.]
The group, which included scientists from Uppsala University, the Karolinska Institute, and the Rikshospitalet in Copenhagen, analyzed 10 chronic lymphocytic leukemia samples using Swegene BAC arrays, 185,000-marker Agilent arrays, the Affymetrix GeneChip Mapping 250K Nsp array, and Illumina’s HumanHap300-Duo BeadChip.
The comparison revealed 29 concordantly detected copy number alterations, or CNAs, including known recurrent alterations, which “confirmed that all platforms are powerful tools when screening for large aberrations,” according to the paper. Loss-of-heterozygosity analysis on the SNP arrays was also performed using a software tool called dChip developed at Harvard and Stanford universities.
To learn more about the study, BioArray News last week spoke with lead author Rebeqa Gunnarsson, a scientist in the Lund Strategic Research Center for Stem Cell Biology and Cell Therapy, Hematology and Transplantation.
Can you describe your scientific background? Why and how are you studying CLL?
We are working in a lab with hematologists; we also have collaborated with Uppsala University and the Rikshospitalet in Copenhagen, Denmark — those authors are on the paper. We in our lab in Lund all work in hematology in special CLL. The other people on the paper are a mix of scientists working with CLL and people working with different types of microarrays at different microarray facilities.
The reason for doing this comparison is that we have collected a lot of CLL samples; we have a biobank with around 600 samples. The goal is to do a lot of experiments using these samples and try to answer important questions about CLL, such as reasons for the development of the disease; why there are differences between patient groups, et cetera. One of the projects is to use microarrays on all samples. So, we wanted to compare different microarrays to see which ones fit our samples and could answer our questions the best.
Also we had the opportunity to do this because I have collaborated from the beginning with the Swegene microarray facility in Lund, which is printing BAC arrays. That is what I started with when I started my PhD project; I did some BAC arrays with CLL samples. So, we already had collaborated with Lund. Copenhagen contacted Agilent; Uppsala arranged collaboration with Affymetrix and Illumina facilities, and they were all involved in the comparison; all the people were working with different arrays.
What were you looking for in a platform?
We wanted to see if different platforms could detect recurring alterations. If you use the platform you should be sure that platform can detect what is common in samples. We also wanted to see how many other alterations the platforms could detect and see if there were differences there.
Also we wanted to gauge the technical performance of the arrays. We are using blood samples and they are not always the best samples because we haven’t selected specific tumor blood cells, the leukemic B-cells, and there could also be subclones in the samples that have special alterations not occurring in other subclones. So you want to have the platform to have big response in log2-ratios so you can easily detect that when you do your segmentation.
How did you design the comparison study?
We picked 10 samples – five mutated, five unmutated. We know from former studies that unmutated samples have more known, recurrent alterations than the mutated ones. So we wanted to have a mix of them. Then we just ran these ten samples on the different platforms according to the different settings for the different platforms.
We first started to do the segmentation and normalization of the data according to what would fit each platform the best. But when we first evaluated different platforms, we decided that it would be best to perform segmentation and normalization with some differences in the settings since platforms select differently.
To do this comparison, to look at the gains and losses, we divided all the probes into virtual probes just to fit every platform to be equal in virtual probes. We also took the normalized data, which was not then segmented, and looked at the technical performance, which is noise level and log2 ratio response.
And how did you select these specific platforms for comparison?
I had used the BAC platform before, so that was an easy choice. When we started this comparison, we didn’t see a lot of other published comparisons, so we chose an Agilent array that was one of the best at the moment when we started. We also calculated that this would be the one that would fit us for the whole group of samples at that time point. Illumina was then really new and Affy was the one that you normally would choose if you would like to have a really good high-resolution SNP array at that time point.
What is dChip?
dChip is a free software tool that you can download at www.dchip.org. It can be used for gene expression as well SNP data analysis. We used this since it was free and it was possible to use both Illumina and Affy data in the same program because if you use Affy, you can use software developed only for Affy, or the same thing for Illumina; they have only analysis tools for Illumina data. We wanted something that fit both of them in the same program.
It analyzed the data using a hidden Markov model and by doing genotyping in the program. So we did LOH analysis with dChip to see if we had LOH in the samples.
You list some pros and cons of each platform in the paper. Can you give a summary of your findings?
All the platforms detected all the large and known recurrent alterations and they also detected CNAs that overlap between two or three platforms. There were specific differences in between all the platforms, though.
If I would summarize all the platforms, I would say the BAC platform is still good even if it is not that reproducible. They are always produced in house, so there will be a difference. It has a good technical performance and it is also very cheap; so if you have simple questions and you have known alterations that are quite large, then BAC would be a good choice.
Agilent was best in terms of technical performance; it has good coverage and a lot of probes; it would facilitate breakpoint analysis, if you wanted to do PCR and confirm alterations. Unfortunately, Agilent doesn’t offer the ability to detect copy-neutral LOH.
Both SNP platforms can show LOH, which is really good. Even if we don’t know how LOH affects cancer cells, it is still a very important part of a cancer study. We did some comparisons between Affy and Illumina’s platforms, and we found that Affy did more CNAs than the other platform. Illumina did not detect as much, but Illumina did offer a more stable detection of LOH.
When you saw a region of LOH, it wasn’t fragmented and had better overlap when we compared it to CNAs. They are both good platforms because you can use them for LOH, but there are pros and cons between those platforms as well.
How do you intend for this paper to be used by other researchers?
We saw a lot of things that we didn’t know when we started this project and we thought it would be helpful in several ways. There are some questions that you might not think about when you start to use microarrays. You pick a platform according to price or the number of probes; you don’t think about other things that pop up – it is good to be aware of some issues that come up when you do this kind of project.
The platforms have different coverage in different genomic regions, so if you are interested in some specific regions, you should probably look up these regions and see what the coverage is of these different platforms and see if it’s included and that they have dense probe coverage there.
If you want to do LOH studies, you have to, of course, choose SNP arrays. If you don’t have purified samples or you have subclones, you should also be aware that that could affect the detection of the CNAs. It is important to pick a platform with high technical performance; there are different data analysis options for different platforms that could also be helpful to answer your specific questions.
Finally, what platform will you continue to use for your own CLL studies?
Right now we are using the Affy 250K array. We still have the collaboration with Uppsala’s Affymetrix facility and we have applied this platform to three different projects: there are two CLL subgroups that we are investigating with CLL and one big study group of CLL samples with a lot of clinical data that we also run with Affy 250K arrays.
Price-wise, we would have chosen BAC arrays, but if you want to run 600 samples you want to be able to do a lot of things – for example, LOH analysis. We chose between Affy and Illumina; that was more negotiation between different facilities and what they could offer us between prices and the time schedules when things could be run.
What could be done, in your opinion, to improve the platforms on the market for these kinds of studies?
For my purposes, if we have subclones in our data, it is not possible to look at all different alterations with FISH; with FISH you can see alterations even with small populations with gain or loss. It would be good to have a tool to interpret if you have subclones and to what extent so you can see gains and losses in microarray data.
It would also be helpful to have a tool that can handle CNVs. Sometimes you bump into regions and you don’t know what they are, and it turns out that they are common CNV regions; so you would like to be able to exclude or interpret these with the program.
I think it would be really good to combine LOH and CNA studies somehow, so you could look at the regions with LOH and CNAs, simultaneously. There are a lot of different programs being developed now. If you have blood samples; you want to take out normal cells that would inhibit detection of what is lost or gained in the genome; there are programs being developed so that you can look at your cancer cells — even if you have normal DNA from healthy cells in your DNA sample — by letting the analysis programs calculate the amount of cancer cell DNA in your samples.