Roger Bumgarner says array data can be meaningful if treated properly
Roger Bumgarner is a research assistant professor at the University of Washington and director of the University’s Center for Expression Arrays. He can be reached at [email protected]
In the ‘Kevin Bacon game,’ players try to connect every actor in Hollywood to every other actor with six or fewer degrees of separation through actors’ appearances in films with Kevin Bacon.
The idea is that when you have small clusters of relationships and a limited number of communications between those clusters, there are not too many steps between one in one group and one in the other group.
Cornell mathematicians actually turned the game into mathematical theory with their June 4, 1998 Nature paper, “Small World Networks,” when they showed how an Internet movie database, the western US power grid, and C. Elegans’ neurons all proved to be examples of tightly connected small-world networks in which a person can get from one point to another very quickly.
I like to use this game as an analogy for explaining why common analysis methods in current microarray papers may not be telling us much that is biologically meaningful. My claim is that, like actors connected to Bacon, any two known genes can be related to each other by six or fewer publications.
The standard array paper, including some that I’ve helped publish, goes something like this. “We compared this sample to that sample. We found this list of differentially expressed genes. See Table One (long list of genes).” Then out of these 200 or so genes that are differentially expressed, the researchers select 10 to write about.
At this point the researchers find some paper in the literature that connects the particular selected genes to other genes or the biology of interest. But just because expression of one gene appears to affect expression of another, this does not mean that the expression changes are directly related to the same biological phenomenon.
The networks of genes are like a spiderweb. If we tug on a little piece of a spiderweb, the whole web rearranges, and we look at that. Similarly, if I tug on one gene, it’s going to affect a whole bunch of other genes that are peripherally related to the main biological function of the gene. And that’s where a lot of people get lost.
So how do you deal with microarray data in a meaningful way?
Invariably, you are going to need some type of data other than microarray data or you will need multiple numbers of comparisons. If you read all the array papers in the published literature, and look at the papers that have resulted in actual biological understanding, all of those papers are ones where multiple types of comparisons were done, where they had some way to limit down from a very large set of genes and focus in on a much more narrow set.
You also have to do replicates of your arrays. If you don’t do replicate measurements you don’t know whether the gene expression changes you found are reproducible, or are artifacts of a particular experiment. My guess is that somewhere on the order of 70 percent of all the genes published as differentially expressed are not reproducible.
Take viral infections as an example. You compare infected cells and uninfected cells. Maybe you want to compare them at several time points. Maybe you want to inactivate the virus and separate those genes that are differentially expressed due to attachment and those due to replication. When you start doing enough of those experiments, then you can tease out of the mess things that are more closely related to any one phenomenon than to others and you have a chance of understanding the biology.
Opposite Strand is a forum for readers to express opinions and ideas about trends and issues in genomics. Submissions should be kept to 550 words and may be submitted to [email protected]