Richard Simon, Chief, Biometric Research Branch, National Cancer Institute
In a study published in the Journal of the National Cancer Institute this week, NCI biostatisticians Richard Simon and Alain Dupuy report that half of the 42 clinical cancer microarray studies published in 2004 contained at least one of three major statistical analysis flaws.
The authors examined papers that related microarray-based gene expression profiles to clinical outcome in cancer patients — an area that is quickly growing in importance in the field.
The three primary failings that they identified included “an unstated, unclear, or inadequate control for multiple testing” in outcome-related gene finding, “a spurious claim of correlation between clusters and clinical outcome” in class discovery, and “a biased estimation of the prediction accuracy” in supervised prediction.
In the paper, Dupuy and Simon provide a checklist of 40 “do and don’t” recommendations for researchers to keep in mind for future microarray analysis.
BioInform spoke to Simon this week about the impact of these findings and how researchers might address some of the issues raised in the paper.
The problem of statistical analysis seems to rear its head from time to time in the microarray community, and it seems from the results of the study that there really hasn’t been all that much improvement in this area since around 2000 or so, when these issues really first came into focus for a lot of people.
It’s hard to say because we didn’t actually look at time trends, but my actual feeling is that there probably has been improvement. There are still a lot of problems, but there has been a lot of improvement. Our finding was that there were a lot of actually very good papers, but there were also a lot of papers that had some major problems.
In the paper we said that in our experience, microarray-based clinical investigations have generated both unrealistic hype and excessive skepticism, so I think there is excessive skepticism, too. So I think the technology is very powerful, it’s leading us very rapidly in the direction of personalized medicine in cancer, and I think things are getting better, but there are still some serious issues to confront.
I think really the issues are issues of interdisciplinary collaboration. When you have a technology that gives a readout of, say, the expression level of 30,000 genes, in order to analyze that kind of data and figure out either prognosis or who responds to what therapy, it’s very complicated, and one can’t expect that biologists or physician/investigators can analyze that kind of data by themselves.
So in some ways, the field has not taken seriously enough the need for interdisciplinary communication, collaboration, and in some cases, I think the problem has been misunderstood. In other words, it’s been viewed as a problem of informatics, a problem of data management, and I think that it’s also a problem of how to design these studies and how to analyze them properly, and sometimes the money’s been put into fancy ways of presenting the data and fancy ways of manipulating the data, but not in terms of bringing into the picture the statisticians who are needed to work collaboratively on these studies.
What was your motivation for looking into this problem?
I’ve been involved in cancer clinical trials for many, many years, and then about 10 years ago I decided to get involved in applications of statistics to genomics. Now, for the past couple of years I’ve been trying to sort of merge those two things together in terms of how do we use this technology to help in development of new treatments and to figure out what treatments work for which patients.
There are very few people who have both the background in clinical trials and bioinformatics and statistics, so I guess I’ve just seen so much misunderstanding of how to bring these things together in order to move forward to personalized medicine, and I review a lot of papers for medical journals and I’ve seen problems there. What I really wanted to do was to try to develop some publication guidelines that would help authors and help reviewers because I think the potential of the technology is tremendous and it’s very powerful. I think many people are using it very well, but there are still a lot of problems and I think there’s so much misunderstanding of some basic issues.
One of the things is people say, ‘Well, Group A tried to develop a predictor of breast cancer patient prognosis, and Group B developed a predictor of breast cancer patient prognosis, and they wound up publishing different sets of genes,’ and therefore, people publish saying, ‘Well, the technology must be bad because they didn’t get the same set of genes,’ and without realizing that that’s a basic logical flaw.
It doesn’t matter whether they got the same set of genes. There are lots of good statistical reasons why you wouldn’t get the same set of genes. The real test is whether the model developed by Group A predicts accurately for independent data, and whether the model developed by Group B predicts accurately.
So the test of a predictor is whether it predicts accurately for new data, not whether if you repeat the study with a new set of data, you come up with the same set of genes. So there’s just so much excessive skepticism as well as hype that we basically wanted to put together a paper that dealt with some of these things.
I’ve actually developed software myself for microarray data analysis. It’s called BRB-array tools, and it’s really oriented to use by biomedical scientists, not by statisticians, and we have over 7,000 registered users. We give it away for free on our website. I know that there are a lot of people trying to use this technology, and they’re trying to learn how to analyze this data and they’re grateful for any help they can get.
You put together a ‘do and don’t’ checklist for the paper. Are there any mechanisms available for enforcing this checklist? Would this be the role of journals?
I think it would be potentially useful if journals adopted either this or something like it. I’m on the editorial board of a number of journals, and sometimes we have recommendations for analysis for users, and we refer both authors to it and reviewers. And particularly if you refer reviewers to it, it can be very helpful.
I hadn’t really planned on trying to convince journals to use this. We’re just putting it out there and figuring that people want to publish good papers. But that actually probably would be a good approach if journals would adopt some guidelines for their reviewers and their authors.
There are actually a relatively small number of really important errors that are being made. We actually focused on three major errors, so it’s not like there’s a lot of little nitty gritty details that people have to learn. We tried to focus on the big issues, and there are a small number of very serious errors that people are making that a lot of papers include that could be corrected.
For example, we addressed papers that measured expression profiles and measured some kind of clinical outcome. And for those kinds of studies, usually, their objectives are one of two types: Either they want to understand what genes are related to outcome for the purpose of understanding something about mechanism, or they want to be able to develop some kind of a tool that will permit you to predict outcome for future patients. And for either of those kinds of objectives, cluster analysis is not the right tool. So that’s one of the basic problems that we found, that people are using cluster analysis for problems for which cluster analysis basically has no role to play.