NEW YORK (GenomeWeb) – Scientists looking to analyze microarray data in DNA methylation studies have half a dozen different statistical methods to choose from, and now they have received help in determining which ones to use based on the sample size of a given study.
Scientists led by Dongmei Li, a biostatistician at the University of Rochester, conducted a retrospective study using both real and simulated data to compare commonly-used statistical methods. Their paper, published July 10 in BMC Bioinformatics, compared false discovery rate (FDR), statistical power, and stability of the Wilcoxon rank sum test, t-test, Kolmogorov-Smirnov test, permutation test, empirical Bayes method, and bump hunting method.
"I've collaborated with lots of investigators doing DNA methylation studies where I've been asked to help analyze microarray data. I've found a lot of tools, and it's hard to choose which one I should use to analyze the data," Li told GenomeWeb. Thus, she set out to compare their performance in studies with small, medium, and large sample sizes.
Li said that the research found that when the sample size is small, the empirical Bayes and bump hunting methods "showed good FDR control." She classified a small sample size as between three to six samples. When sample sizes are large, with up to 24 or more samples per group, all methods worked well, although bump hunting had lower stability when there was a large proportion of differentially methylated loci.
Li often is brought in to analyze data from methylation arrays looking at CpG sites, such as the Infinium HumanMethylation450 BeadChip from Illumina and GeneChip Human Promoter from Affymetrix. The arrays represent less than 2 percent of all CpG sites in the human genome, but are sufficient to look for changes in methylation, Alika Maunakea, a scientist at the University of Hawaii and a collaborator of Li's, told GenomeWeb. Maunakea's research will often look for biologically significant differences in methylation levels between healthy and unhealthy populations.
"We have to make sure any differences between one sample and the next is robust," he said. "A methylation difference of one percent doesn't mean much, but if it's greater than 10 percent that's meaningful — perhaps there's some disregulation at that site. But it's not only the difference in robustness between samples, the frequency of that difference in a sample population is important, too."
The statistical analysis Li uses helps us to define both, Maunakea said. Often times the result is a change in methylation levels, measured with a Beta value, along with an associated p-value. Both magnitude of differences in methylation and frequency are built into that result. "It provides us a way to sort out those potentially ambiguous data points that may not have a true biological difference and limit the false positives," he said.
Among the different statistical tests evaluated by the researchers, the t-test is a basic statistical test that compares the mean value for methylation levels of two groups. The Wilcoxon rank sum test is similar, except it's a rank-based nonparametric test, comparing the median methylation levels of the two groups.
The Kolmogorov-Smirnov test is another non-parametric test, Li explains. "It compares the distributions of methylation levels of one group versus the second. It's looking for both location and shape differences between groups," she said.
A permutation test is a resampling-based nonparametric test and determines whether two groups have different distributions of methylation level by permuting the data under the null hypothesis. "The distributions should be the same under the null hypothesis," Li said.
The empirical Bayes method is different from standard Bayesian statistics, where priors are fixed before any data are observed. "It takes a hybrid classic Bayes approach and shrinks the estimated sample variance toward a pooled estimate in the moderated t-statistic," Li said, "which results in more stable inference in studies with small sample sizes."
The bump hunting method takes the correlations of methylation levels between nearby CpG loci into account and hunts for "bumps" along the smoothed function of estimated methylation level differences between groups from linear regression models.
Li's paper came up with a few conditional findings about when to use the different statistical methods.
When nearby CpG loci show correlated methylation in studies with a small sample size, Li recommended using the bump hunting method. "The CpG loci included on the array are from all over the genome but there are some situations where there are probes that come from the same region in genome," Maunakea explained. "Loci can be independently methylated but often clustered loci will all be methylated."
"If we see changes in methylation at nearby sites, it can give confidence that something is going on," Maunakea added. "If it's a low sample size, it's hard to know whether a difference at a single CpG site is biologically meaningful. You rely on surrounding sites; you don't have enough statistical power to show the significant difference between case and control, but the nearby sites might be informative."
Conversely, in studies with large sample sizes, where there was a large proportion of differentially methylated loci, Li recommended against using the bump hunting method because it had lower stability.
Maunakea interpreted that to be related to the fact that many methylation arrays are designed to look for differences in methylation related to cancer. Cancer samples can often show a stark difference in methylation levels compared to healthy controls. For a hypothetical study with 20 samples, it wouldn't be out of the ordinary to see 19 of those samples with a 50 percent methylation difference, he said.
For other conditions the differences can be much subtler; there might be only one sample with a difference of 50 percent at a particular locus, he said. "The rest are maybe 20 or 10 percent."
The methylation arrays can still pick up interesting results for studies focusing on conditions other than cancer, Maunakea said, but to do so, scientists must tailor their analysis to account for features in the data coming out of the array.