Usability should be engineered into the development process of web-based bioinformatics resources, particularly because databases and applications don’t grow in linear patterns, according to a team of researchers from the UK, the US, and Italy.
Writing in a paper in Bioinformatics appearing online in December, the researchers explained how they used “state-of-the-art” usability evaluation methods to look at CATH (Class, Architecture, Topology, and Homologous superfamily), a browsing-oriented repository of protein classifications, and also studied user behavior when searching through Bio-Carta, Swiss-Prot, and the National Center for Biotechnology Information's databases.
Navigating through CATH, the scientists noted that, for example, users encounter “breakdowns” when browsing through different sub-systems of the repository that are updated at different times, but that it may not be clear to the user, and that the navigation can take them to “obsolete content.”
According to the study, when searching, users faced “significant” barriers when attempting to perform a search both when formulating a query and when interpreting the long lists of results the repository returned.
The scientists outlined remedies for some of the challenges these bioinformatics resources present, and emphasized that bioinformatics resources stand much to gain from a “human-centered development process.”
Usability is a new field in bioinformatics and is “very unexplored are so far,” lead author Davide Bolchini, a researcher at Indiana University’s School of Informatics, told BioInform .
The research team also applied knowledge in usability valuation methods and design practices to show how they might be applicable in bioinformatics, he said.
Data-intensive web applications have “been around for a while; there are some good practices of design [showing] potential benefits that can come from design expertise in other fields,” he said.
With this paper, the team wanted to “start to characterize some general problems, to capture the tip of the iceberg,” Bolchini said.
A lot of effort, he said, has been spent on development, data integration, data visualization in bioinformatics resources, but less on “the difficulties users may encounter while searching this data, making sense of this data,” said Bolchini.
Coming of Age
The study unfolded during Bolchini’s post-doctoral fellowship from 2007 to 2008 with University College London computer scientist Anthony Finkelstein and colleagues, who were “applying usability methodologies and usability valuation methods to improve the quality” of bioinformatics applications, according to Bolchini.
Finkelstein heads the department of Software Systems Engineering at University College London’s department of computer science.
Finkelstein's group focuses on large, complex information systems, including bioinformatics and other scientific fields, as well as commercial systems. When Bolchini arrived at the lab, Finkelstein had been exploring human-computer interaction issues with Italian researcher Vito Perrone with whom Bolchini had previously collaborated while at the University of Lugano in Switzerland.
Finkelstein told BioInform that in almost every field, the first phase of application development “is dominated by a sort of a tech can-do and then people take a step back [to think about] is this actually what people are wanting … and how people are going to work with things. It’s a sort of coming of age,” he said.
Improving usability enables people to “do better things,” he said. “We have all used rubbish applications with time wasted clicking that goes on in navigation across deep hierarchies, where people haven’t thought about it from a user’s perspective.”
[ pagebreak ]
Finkelstein and his UCL colleagues have also been working on the National Cancer Research Institute Informatics Initiative, the UK’s equivalent of the National Cancer Institute’s cancer Biomedical Informatics grid, or caBIG. Finkelstein said. He and his colleagues is helping to build usability evaluation into the development process at NCRII.
“It’s about ensuring the infrastructure we build is focused on serving the users,” he said. “Much of the data has been wrested from the wet-lab at considerable pain and expense. It behooves us to make quite sure that it is presented in such a way that people can take advantage of it.”
Finkelstein acknowledges usability evaluation is not “zero cost,” but when the user’s time is taken into account, attention to usability is an “economically effective thing to do.”
More generally, the idea behind last month’s Bioinformatics paper, Finkelstein said is to show that technology practices and techniques are available in other fields and can be used to create “industrial-grade web applications” in bioinformatics, Finkelstein said.
In their paper, the researchers said that ongoing activities in the field of usability-engineering in bioinformatics include the Human-Computer Interaction Lab at the University of Maryland, which is working on visualization methods for large data sets in databases. And in Montreal, scientists at McGill University have used web-based resources to observe the “typical” daily activities of bioinformaticists, which is part of usability evaluation.
Bolchini expanded on results by attained by Finkelstein, Perrone, and fellow UCL researcher Sylvia Nagl by asking 10 users with a range of two to eight years of experience in bioinformatics to do typical search tasks in order to study the usability of BioCarta, NCBI, and SwissProt conduct searched. An observer chronicled the resulting experiments.
The tasks began with the genomic characterization of a breast tumor in a fictitious cancer patient with a strong family history of the disease. The teams were asked to explore genes regulated by the estrogen receptor, and to look deeper into the interaction between BRCA2, telomerase, and p53 because it could indicate if the patient has a heightened risk of mutations connected to accelerated metastasis.
Bolchini’s team observed that the study participants had difficulty understanding the terminology to use in a query because assumptions made by the search interface designers made it difficult to translate a search task into proper keywords, he said.
“The complexity of the domain knowledge is reflected in the difficulty [users have] in using the interface,” he said. Having a world of “fragmented knowledge domains” is typical in bioinformatics. “The interface does not allow users to overcome this problem [of fragmentation].”
For example, NCBI lets users search by proteins, structure, genomes, CancerGenomes, and many other domains, but users found it difficult to “properly” explore the options, requiring them to perform several trials to obtain results.
“The main problem lies in the way functionality is communicated to the user,” the scientists said in their paper.
With SwissProt, a challenge is that users must select the database to search rather than being able to select a “content domain,” the scientists stated.
Although the scientists noted that the limited number of task and users in the study “precluded” a meaningful quantitative analysis of the results, they said it nevertheless provided “valuable insights.”
A “recurring obstacle for users” was that the given repository recognized only one form of spelling for a term such as estrogen. That meant that a search yielding “no result” could either mean “no result” or that there might be more search results but that the search engine provided no spelling support.
“There is a world of possible design solutions“ for this challenge alone, Bolchini said.
The scientists recommend that bioinformatics resource designers help the user see the search scope, offer query examples, and search ontology and that automatic alternative spelling recognition “should be systematically supported.” SwissProt, for example, captures synonyms of protein names, the scientists highlighted.
“Standard usability tells you if there is a range in usage and semantics all the way through to trivial errors [then] your users will make them,” Finkelstein said.
Even if systems are designed for scientists, he said, designers need to remember that “sophisticated audiences make unsophisticated errors,” also because they are busy. “Don’t make rocket science like rocket science,” Finkelstein said. He said that designers must remember that scientists are not just retrieving data, but rather using the result of a query in a broader context to address a series of questions in their scientific investigation.
[ pagebreak ]
The usability scientists document in their paper that users also encountered difficulty in managing long lists of results, which is the typical answer to a search query and, the scientists stated, these lists are even expected in bioinformatics, the scientists found. The long lists intimidate users rather than encourages them to explore and leads them, the scientists wrote, to for example tend to focus on the first, second or third search query results.
Model Users
The design challenges in that instance, the scientists said, rest with the lack of visual organization of the result items such that it “does not guide their eyes to easily master the complexity of the results at a glance.” This problem can be remedied with a more “user-centered design” in which key elements might be highlighted and presented in a way that offers an overview of the large set of retrieved elements, the scientists said.
The way results are ranked is often not transparent to the user, making them “feel helpless in formulating a strategy for reading the large set of results.” Another obstacle was also in the document title or the document excerpt which “did not help much in identifying relevant content.” For example, in BioCarta, search results are marked with the letters ‘H’ or ‘M,’ denoting ‘human’ and ‘mouse’ but on the first overview page, no explanation of these letters is given.
What the usability study revealed more generally, the team said, is that if not “properly addressed,” these issues can cause users “even when strongly motivated and committed” to run into a “frustrating loop of ‘trial and error’ search queries,” as they guess at ways to obtain results.
Application designers tend to think of themselves as the model of the user, Finkelstein said. “And of course, they’re not, [the users] are different in what they are seeking to do, in their background. Putting yourself in other people’s shows is not an easy job,” he said.
The other part of the study, Bolchini explained, focused on browsing-oriented bioinformatics applications, notably CATH, the domain classification database of proteins in the Protein Data Bank, hosted at University College London. CATH has four levels of classification: class, architecture, topology as in fold family, and homologous superfamily. “A lot of the user-experience [in CATH] is about browsing around, navigating and making sense of content,” Bolchini said.
For their analysis the scientists chose “a common technique” in usability testing which is inspection, said Bolchini. “It basically consists of systematically looking at all the characteristics of a web application from different perspectives,” he said. That includes studying aspects independently such as content structure, the information architecture, the quality of the navigational process, the consistency of links and labels.
In the evaluation of CATH, the team used an inspection method to which Bolchini has contributed called Mile+, a heuristics tool that Bolchini co-developed at the University of Lugano’s Technology Enhanced Communication Laboratory, TEC-lab.
Inspecting CATH in term of its access structures and the navigational roads to the content, the scientists found that it “is some way from supporting an efficient retrieval of the desired superfamily.”
Access to the 1,459 homologous superfamilies in CATH is through 98 pages each with 15 items. Users who know a specific superfamily name must browse through these lists to find what they seek.
Looking for a particular superfamily, such as urate oxidase on page 51 of 98, the scientists estimated it takes 17 minutes unless the user knows the CATH code of the superfamily, which can be entered into the search engine.
“If you know a piece of a name or a name, the only browsing possibility offered to you is very inefficient, very cumbersome,” Bolchini said. One easy remedy is to alphabetize the index.
What the team found, said Bolchini, is a design fundamental, “an inconsistency in the navigation” and added that navigation access designers need to keep “different scenarios with users” in mind, exploring situations and research modes in which a user might want to access and browse the data.
Bolchini said he and his colleagues worked closely with CATH curators and designers to inform an ongoing database and interface redesign. “The version you are seeing online is a redesigned version.”
“We got a positive response,” Finkelstein said about his CATH colleagues also because they are in the midst of a substantial redesign. Version 3.2 includes techniques and specific suggestions from the team. “I think the biggest contribution we can make is the usability mind-set.” Finkelstein said he and his colleagues are going to continue collaborating with the CATH team helping with the re-design.
The plan is also to extend the usability approach to other bioinformatics systems under development. In addition to Mile+ there are “less sophisticated tools and sets of heuristics available” Finkelstein said, and an “an awful lot of good practice out there” for usability design ranging from “very sophisticated audit tools” to straightforward tips and heuristics. He declined giving names of tools or projects, saying it is still too early to go into detail about the team’s “broad agenda.”
[ pagebreak ]
Bolcini said that bioinformatics curators and designers are mainly concerned with data accuracy and that “may take the lead in terms of focus, effort and time.” Usability inspection, though, could be helpful in improving bioinformatics resources, he said.
One challenge in bioinformatics, he said, is that web-based resources do not grow in a linear fashion and are comprised of subsystems of tools. “The tools are sort of glued together in an overall application,” he said.
Falling Off the Time Cliff
Given the way bioinformatics unfolds, the systems are updated at different times. “You have a release timeline, which is different for each of the subsystems,” Bolchini said.
Users navigate from one subsystem to another without necessarily recognizing these differences and “fall essentially into another time status,” much like walking through a museum with rooms from different time periods in history. “All theses subsystems are connected through links and then managing these links throughout the timeline is a mess,” he said.
This is not just “a simple fix, it is a problem of design,” he said. He and his colleagues suggested “navigational good practices” with landmarks orienting the user and delivering a kind of branding. These links should be regularly checked because “they are the navigational glue,” Bolchini said. They could be labeled with a version number and the date of the last update clearly indicated to the user, he said.
By disseminating good practices when it comes to usability, Bolchini believes it can help improve research itself, he said. He began his tenure at Indiana University professorship last fall and his paper has launched cooperation projects with several bioinformatics researchers at IU, for example helping with large-scale projects under development.
His colleagues said it was the first paper they had seen on this subject and “they wanted to collaborate, to start to import and use [human computer interaction] design principles in all the applications they are developing.”