At A Glance
- Terry Speed, senior research fellow, genetics and bioinformatics group, Walter and Eliza Hall Institutes of Medical Research, Melbourne, Australia; and professor, department of statistics, University of California, Berkeley.
- 1969 — PhD, mathematics and statistics, University of Melbourne, Australia.
- 1965 — BSc, mathematics and statistics, Monash University, Australia.
Spring is always around the corner for microarray statistical analysis guru Terry Speed. Half of his year, from January through May, is spent as a professor of statistics at the University of California, Berkeley, and the other half, as director of bioinformatics for the Walter and Eliza Hall Institutes of Medical Research in Melbourne, Australia.
Speed also serves as a bridge between the Stanford self-spotters and commercial array makers like Affymetrix. His lab produces the SMA (statistical microarray analysis) library and marray libraries for bioconductor.org, which is written in the public domain statistical language R.
He recently spoke with BioArray News.
Living for half a year in Australia and the other half in Berkeley must give you a unique perspective on the genomics world.
They are very different scenes; Australia is not as developed in biotechnology, but still strong in its own way. There is not a lot of money going into research but [there are] a lot of bright people thinking up good experiments. It nice to be in the middle of experimentalists for an extended period of time, rather than having to go looking for them. When you work on a big campus like Berkeley, you have to leave your own enclave, go down to the bottom of the campus, or up the hill. Whereas in my job in Australia, I’m having coffee with them.
You are a statistician, why work in genomics?
As a statistician, you have to point your statistics at something. I point it [at] genomics, rather than the bureau of the census, or clinical trials. If you want to apply a branch of mathematics, statistics, or computing, it doesn’t make a whole lot of sense if you don’t understand the context. It’s a matter of degree: Some things you can get by with relatively little contextual knowledge than other things, but microarrays are definitely not one of them — the more you know, the more you are able to be useful. They are a relatively complicated process, these microarray assays. If you are not aware of many of the complications, you can trip up.
How did you first get involved in microarrays?
I have been on the edge of genetics and genomics for 15 years now. I was around when they started emerging. I have been affiliated with groups of people — mathematicians, statisticians, and molecular biologists — that get together regularly. I heard Pat Brown and the Affymetrix people promoting their stuff as soon as it was ready to go public. At some point, somebody wanted to get help with a grant application or an analysis and I said “Sure, it sounds interesting.” Then somebody said: “Here is some data, give us some help,” then you roll up your sleeves and you are in it. A lot of people started to get into the cDNA stuff two or three years after Pat Brown and David Botstein showed how effective that was. Amongst the world of statisticians and bioinformaticians, I was as fairly well prepared to move, simply because I had been watching it for two or three years.
What do you see as your role in this field?
It would be great to say that I’d like to cure cancer. I have a rather humbler view of my role, which is to sharpen or improve the tools that people use to do grand things like cure cancer.
Statisticians get relegated to a role of policeman —saying “yeah, that method is right,” or putting a “P” value on something — you are not particularly involved and you are there to give it a seal of approval. In the world of gene expression, you can have a much more active role. There are clear problems that need solutions [where] statisticians have the skills to contribute. There are opportunities for intervention, for discovery of algorithms that can be helpful. You have a chance to do something creative to try to figure out what is going on, whether it’s designing the experiment, or discovering methods to achieve an end that somebody thinks is feasible. It’s not just how you treat the data that comes off the chip. There are ways of treating the data well and not so well. It’s figuring out the best way to get the information from data. That is one crude description of the task of statisticians.
What improvements do you foresee for tools?
The main thing that is lacking is not just tools; it is people having ideas that address hard questions, or helping design experiments to address hard questions. One of the things that people are going on a lot about is pathways, whether they are biochemical pathways, or signaling pathways. They are very complex entities and they are of enormous interest biologically. But the extent to which you can learn about the novel ones from microarray data is very unclear. You think if you take a lot of snapshots of gene expression, you really should be able to learn a lot about things like pathways, which gene is on, and who represses whom. If somebody says, “We need tools to elucidate signaling pathways” — that is so vague that it is almost meaningless. What tool you need will depend on the nature of your experiment. We need people thinking about experiments that throw light on signaling pathways way before we need tools.
What are your thoughts about normalization tools?
Normalization — cleaning data, making things comparable, getting rid of bias — is where I spend more time than any other thing, thinking about it. It is on the table, it will continue to be considered. There are already lots of tools out there and I’m sure there will be new ones and better ones. There is not a lot of novelty any more there. As the field changes, our understanding of the processes that need to be addressed needs to increase. The paper that I wrote two years ago about normalization is no longer satisfactory. We have to keep improving.
What are your thoughts on the bifurcation in the microarray world?
There is more in common with the methods than you might think. The field maybe looked separated for a couple of years because Affymetrix, let’s just say was not terribly open to encouraging alternatives to their approach. That changed perhaps a year or two ago, and they have been much more open and encouraging, and a commonality of approaches has evolved, and become much more transparent. I certainly kept right away from Affymetrix for most of my time simply because there was definitely not a welcoming environment for people to come in and [do] something different with the Affy data. I’d say also that people laboring to make the cDNA work came to realize that maybe spending a few thousands to use Affy chips would give you results quicker than the six to 12 months of getting a postdoc to try to get the technology working in the lab. You could just pay your money, take an off-the-shelf chip, and two weeks later, you’ve got an experimental result. I don’t think of them as two totally different worlds, I think of them as variations on a theme. So what you do with short oligos on the Affy chip is not hugely different than long oligos, which is not different from what you do with cDNA. You have one area with a lot of variations. The ways we work with Affy have turned out to be effective with cDNA, and vice versa. I don’t see a huge gulf anymore.
Are you a Switzerland in the microarray world?
It’s quite good to not be in one camp or the other. As for what chips to use, that is context dependent. I use an old phrase, “horses for courses,” [in other words], if you are going to use hundreds of chips, unless you are a pharmaceutical company or an extraordinarily funded academic, you won’t be using Affy. It’s as simple as that.
Tell me about BioConductor.
We contribute to BioConductor. It’s a spinoff of the R organization, devoted to R-based tools for biology and bioinformatics. R is a statistical language, and it will become the most widely used academic and research statistical package. The most widely used application today is definitely SAS. But it is not cheap and it is certainly extensive. With SAS, you can’t have [a] little bit that does the job; you have to have the whole box of dice. It is a serious commitment, financially and computationally.
With BioConductor, if I want to do normalization, I can get some code [a module] that will do normalization and the rest of the stuff in Excel.
BioConductor is where microarray stuff goes. It’s not simple downloadable stuff, it’s not Excel, and it’s not straightforward point-and-click. It has the logic, syntax and structure of R. We want to put it [BioConductor] into a form that can be widely used. People are trying to develop much more friendly statistical packages built on the R statistical language, but I don’t have resources to spend on developing interfaces like that; it’s costly, and it’s not easy to get money to do that sort of thing in the academic word.
BioConductor gets better and better, people work harder and harder. It’s free, downloadable, high quality, well-documented and supported. It’s the sort of thing that the academic community is keen on. Free stuff gets tried, feedback comes in, and it gets approved. Say some smart young professor creates software, and slaps a price tag on it, you can almost guarantee it won’t be widely used. The greater the price tag, the fewer people that are going to try it.
The BioConductor packages for cDNA and Affymetrix start off basic and evolve as time goes on. You need special tools to deal with special problems, like microarrays.