Researchers at the Virginia Bioinformatics Institute recently assessed several microarray-analysis software packages based on their ability to provide researchers with “biological insight” — a seemingly subjective term that the VBI team painstakingly broke down into eight measurable characteristics in order to glean quantifiable feedback from participants.
While the study was limited in its scope — it evaluated only five visualization packages — industry observers applauded the effort as the first such usability study in bioinformatics, and hailed it as a sign that the field may finally be coming of age.
“I think usability is one of the barriers that keeps these tools from a much larger percentage of true biological users — not bioinformaticists or cheminformaticists or computational biologists, but bench biologists,” said Evan Steeg, a life science data-mining and IT consultant. “In order to get to the point where these tools become ubiquitous for biologists … there is a definite usability burden has to be surmounted, and these kinds of studies will help.”
Michael Lelivelt, senior manager for informatics applications at Affymetrix, said that the VBI study is “a great start” toward quantifying visualization performance — as well as other features of microarray analysis software. “Certain visualization is important, but insights can come from many aspects of using software,” he noted in an e-mail to BioInform last week. “I think the best service to the community is to have independent parties continue to bring visibility to these issues.”
Karen Duca, co-author of the study, said that human-computer interaction evaluations are common in other fields, but are still a rarity in bioinformatics. One reason for this, she suggested, is that “the biology field in general hasn’t been used to dealing with dense data, so I won’t say that we never had the need for good interface design, but perhaps we’ve never been quite as demanding” as users in other fields.
Duca has been working with microarrays since 1998, and said that most of the analysis packages she’s tried over the years have been “very, very lacking” when it came to providing maximum insight. “I felt there was really a need to examine how the tools were doing in terms of what knowledge people took away,” she said.
To that end, Duca teamed with Christopher North, a human-computer interaction expert at Virginia Tech, and, with the help of grad student Purvi Saraiya, began identifying quantifiable characteristics of insights, or “units of discovery,” that could be encoded for analysis.
These included: fact (the actual finding about the data); domain value (the significance of the insight); hypothesis (whether an insight led directly to a new, biologically relevant hypothesis); breadth vs. depth (whether a finding was about a general biological process or a more specific one); directed vs. unexpected (whether an insight answered a specific question or was a serendipitous discovery); time (the amount of time required); and category (overview, patterns, groups, or details).
The researchers chose five software packages based on their “popularity and availability” — two from the commercial sector (Spotfire’s DecisionSite and Silicon Genetics’ GeneSpring) and three academic packages (Cluster/Treeview, Timesearcher, and Hierarchical Clustering Explorer, or HCE). The study’s 30 participants, which included domain experts, domain novices, and software developers, were each assigned one of three data sets and a software tool with which they were not previously familiar. They were given a 15-minute tutorial, and were instructed to examine the data with the tool “until they felt they would not gain any additional insight.”
The VBI team found that Spotfire’s software scored highest in the total number of insight occurrences — with 25, as opposed to 20 for GeneSpring, the next highest, and 14 for HCE, the lowest. Spotfire, with 66, also ended up with the highest insight value (the sum of the domain value for those occurrences) compared to 40 for GeneSpring, the next highest, and 34 for HCE, the lowest. Researchers using Clusterview/Treeview spent far less time gaining insight than with either of the commercial packages, however.
One “surprising” finding, Duca told BioInform, was that the expert users “weren’t getting considerably more out of the packages” than the novices. “I think what this is telling us is that we need more methods and better methods, and then we’ll see that difference between the expert and the beginner widen.” She cited more dynamic and 3D representations, as well as sound cues or motion cues, as possible directions for improvement.
The VBI team is now using a similar methodology to assess several pathway-analysis software packages, Duca said.
Duca and her colleagues presented the results of the evaluation at InfoVis2004, a visualization conference held in October. A paper outlining the study has been published as part of the conference proceedings (http://csdl.computer.org/comp/proceedings/infovis/2004/8779/00/87790001abs.htm). Spotfire, touting its strong showing in the comparison, has also posted the paper on its website (http://www.spotfire.com/review/). Duca noted that Spotfire did not support the study “in any way, shape, or form.”
Yes, But …
Some observers pointed out limitations of the study. The paper “presents two commercial packages and three academic; that’s really not enough,” said Georges Grinstein, a data visualization and bioinformatics expert at the University of Massachusetts, Lowell. In addition, Steeg noted that “the data sets were rather small compared to today’s gene-expression data sets.”
Steeg also cautioned that the study’s focus on only one “axis” of microarray analysis — visualization — diminishes the significance of other important analysis features, such as statistical rigor.
“There’s a creative tension between visualization on the one hand — really enabling and harvesting human pattern recognition — versus having software actually deduce and infer what is correct, and they don’t always align,” Steeg said.
Duca herself acknowledged as “caveats” the small sample size of 30 researchers, the limited number of tools that were evaluated, and the fact that the three data sets were “tailored” for the experiment. “We had pruned the data somewhat to make it easier to find things,” she said.
Nevertheless, she noted, the primary goal of the study was to identify quantifiable measures by which the usability of these and other tools could be assessed in the future — a goal she thinks was accomplished. “I don’t think that people have tried to be quantitative about biological insight,” she said. “Clearly, this is the first of its kind, and we’ve got a lot more work to do, but it moves us from just filling out a sheet in terms of how fast you could do it, to also saying, subjectively, how much did you learn, and how much do you appear to objectively learned that you didn’t know before you sat down?”
The Key to Commercial Success?
Chris Ahlberg, founder and CEO of Spotfire, told BioInform that the company already performs several different types of usability studies internally, but “at a smaller scale” than the VBI study. “What was really compelling in this paper was that they looked at the number of insights that people generated, and how they generated those,” he said. “The classic type of analysis that you do in a study like this is that you look at how quickly somebody completes a series of tasks, and how many errors did they make while doing it.
“That’s all well and good, but in the end game, you can do a task very efficiently, but it doesn’t really matter in gene expression analysis if you don’t find the stuff you want to find,” Ahlberg added.
While pleased with the results of the VBI findings, Ahlberg said the study did provide some food for thought for future improvements. He did not provide details about the company’s plans, however.
Other vendors are also turning to usability studies to hone their products. “As we come up with new features and new versions, we take the prototypes and give them to customers who have never worked with the software before, and see how quickly — without a manual, without anybody explaining to them how to use the software — they can accomplish what it is that they are trying to accomplish,” said Jason Goncalves, general manager of Stratagene’s Iobion software business unit.
Goncalves wasn’t aware of the VBI study, but said, “It’s great for me to hear that someone is actually looking at this.” He noted that usability would be a key issue for commercial microarray software companies as the field evolves to meet the needs of a broader base of end-user biologists.
“My view is that there’s too much complexity around many relatively straightforward tasks, and that’s going to break in the future because we know that there is going to be the release later on, even in this year, of new types of arrays: denser arrays, tiling arrays, exon arrays, arrays for ChIP analysis,” he said.
Ahlberg also stressed the importance of designing user-friendly software that can appeal to a broader customer base. “If you go back and take what happened to the bioinformatics market over the last five years, it’s easy to say for all these software companies that were serving that market that the market went away. … But a lot of what happened was that you had bioinformaticians building software for use for themselves and hoping that the world would just come running and [would] want to spend a lot of money on it.”
Scientific software, he said, “has got to be very quick, intuitive, and user-friendly if you want to reach beyond those expert users, and that is very hard.”
Mike Lelivelt, a product specialist at Affytmetrix, agreed. “Since there are more end-user biologists than classic bioinformaticians, applications which bring more value to the end-user biologist will preferentially grow,” he said. “The challenge is bringing an application to market which captures the technical rigor associated with a bioinformatician and the usability that the end-user requires.”
Both Ends of the Spectrum
Striking that balance is one of the primary challenges for developers conducting usability studies for microarray analysis software.
“You can’t please everybody all the time,” said Steeg. “You have to segment your users, understand their input, and recognize that even for a given user, his or her needs will change over time.”
For example, he noted, even in one project, “the kind of tools you need at the very beginning, the exploratory part of a project, are different from what you need when you’re trying to validate something before you send it off to publication or to the FDA, or before you make an important business decision — a go/no-go decision on a new biomarker, for example.”
Visualization tools like Spotfire would be preferable for the first part of that process, while more statistically rigorous tools, like R or BioConductor, would be better for the second half, he said. The challenge for software developers, he added, is that “the world needs both.”
Grinstein, who described himself as “a non-advocate for usability studies”, argued that the microarray-analysis field would be better served with studies of software “utility” instead. “Too many people develop a tool and then go right into a usability study, as opposed to really thinking about how useful is that tool for solving new problems,” he said. “It’s really important to push the field, not just do scatterplots and focus on whether this interaction is better than that one. We need lots and lots of new tools.”
Grinstein said that software packages shouldn’t try to meet the needs of both expert users and novices. “It’s like a jet airplane,” he said. “Jet airplanes are very powerful, useful machines. Are they usable by us? No. You need an expert driver, but that’s okay. What I’m advocating is, ‘Let’s have some expert drivers of these new powerful tools that are evolving, and down the road we’ll have single Cessnas that everyone can fly.’”