Skip to main content
Premium Trial:

Request an Annual Quote

SAS Institute Looks Toward Genomics; Hopes to Bridge the Gap Between Statistics and Biology

Premium

Statistical software giant SAS Institute has established a genomics group that will aim to apply its methodology to analyzing gene expression data, Russ Wolfinger, the director of the division told the New England Bioinformatics Group last week.

Wolfinger told those gathered for the first NEBiG meeting of 2001 that a statistical approach to gene expression analysis could improve the accuracy of experimental results.

The method he described uses two interconnected mixed linear models to assess statistical differences in gene expression data from microarrays.

The introduction of the new method comes amid a host of claims that the bulk of current gene expression experiments are fundamentally flawed because the importance of statistical significance is often overlooked. Other researchers, such as Eytan Domany of the Weizmann Institute of Science in Israel, are also developing new clustering algorithms designed to help make sense of the vast amount of microarray data.

Wolfinger said most papers published on the topic consider only fold changes in expression, eliminating those genes with lower than a two- or three-fold change. However, this approach could miss a biologically important gene that may exhibit a 1.2-fold change that is reproducible and precisely measurable and, therefore, statistically significant. Conversely, he said, some genes may have a large fold change in one array but be highly variable from array to array, which would lower their statistical significance.

Calling the approach complementary to clustering, he explained that it works “by answering a different question.”

“Clustering looks for similarities. This looks for differences,” he said. By accounting for variance in expression levels, he said the approach ensures that “the signal is greater than the noise,” which reduces the occurrence of false positives and false negatives.

An added benefit of the method, according to Wolfinger, is that it works just as well without a reference sample. Instead, he said, it uses a circular design, comparing the first sample to the second, the second to the third, and so on until the final sample is compared to the first. Though this claim appeared counterintuitive to some of the NEBiG attendees, Wolfinger said that unlike clustering, which has to “compare apples with apples,” the differences in variability that his method relies on are accounted for in the statistical model.

But the question remains whether the approach will help biologists and statisticians meet on common ground as Wolfinger envisions. Noting that this method requires four or more replicates, one attendee remarked that her lab never runs a microarray experiment more than twice due to time and cost constraints.

“I’m a statistician,” Wolfinger quipped, “of course I’m going to tell you to keep doing it over and over again if you can afford it.”

Wolfinger developed the method in collaboration with Greg Gibson, of North Carolina State University; Elizabeth Wolfinger, of Meredith College; and researchers from the National Institute of Environmental Health Sciences.

The team has submitted a paper on the approach to a leading journal.

— BT

Filed under

The Scan

Suicidal Ideation-Linked Loci Identified Using Million Veteran Program Data

Researchers in PLOS Genetics identify risk variants within and across ancestry groups with a genome-wide association study involving veterans with or without a history of suicidal ideation.

Algorithm Teases Out Genetic Ancestry in Individuals at Biobank Scale

Researchers develop an algorithm known as Rye to tease apart ancestry fractions in admixed individuals at a biobank-scale, applying it to 488,221 UK Biobank participants in Nucleic Acids Research.

Multi-Ancestry Analysis Highlights Comparable Common Variants at Complex Trait-Linked Loci

Researchers in Nature Genetics examine common variants implicated in more than three dozen conditions, estimating genetic effect similarities across ancestry tracts in admixed individuals.

Sick Newborns Selected for WGS With Automated Pipeline

Researchers successfully prioritized infants with potential Mendelian conditions for whole-genome sequencing or rapid whole-genome sequencing, as they report in Genome Medicine.