McGill University Center for Bioinformatics
There are currently more than 35 different software packages available for visualizing and analyzing biological pathways, according to a recent review article in Bioinformatics, but the authors of the study note that this growing suite of tools is still doing a poor job of meeting the needs of both of its primary user bases: bench biologists and bioinformaticians.
One reason for this problem is “a simple lack of communication” between various disciplines that make up the field: graph drawing, information visualization, network analysis, and biology, according to the authors, Matthew Suderman and Michael Hallett of the McGill University Center for Bioinformatics.
“It is unfortunate to witness the number of new tools designed for biology that have earlier analogs in other research areas,” they write. Conversely, “graph drawing researchers ignorant of emerging biological models tend to tackle layout problems of little use in biology or, at best, express solutions to relevant problems in a way that is inaccessible to biologists.”
The authors conclude that “the key to success of this endeavor is the involvement of all these communities towards standards, open-access software, and distributed development.”
A full list of the 35 software packages covered in the review, as well as several additional tools, is available here.
BioInform spoke to Suderman about the findings and how developers of network visualization tools might be able to improve their methods.
You reviewed 35 different pathway software tools for this article. What’s your take on the current state of the art for visually analyzing biological networks?
There are a lot of tools there. I think mostly what you see is [that] either a tool is very specific for a certain purpose, or — especially when you look at the commercial tools — what you tend to find is that they’re more for, say, a biologist who’s not necessarily into doing a lot of statistics or computer programming or things like that. They just want to do something quick.
In our lab, we do a lot of service for collaborators and so on, and we find that they tend to like the nice polished tools [where] they can do it quick and see a few nice pictures, put it in their paper, and that’s that.
But as far as more for bioinformatics, we tend to be a little more critical of the tools because we’re usually trying to do something more specific and chances are it’s not available yet. So you end up having to cobble something together to try and make it work.
When you say you’re trying to do something more specific, what sort of analysis are you talking about?
I can give you an example from my own research. One of my collaborators is an epigeneticist. So right now, we’re just starting to get into doing ChIP-on-chip arrays in order to profile DNA methylation. And a lot of the background work behind the arrays isn’t really done yet, in terms of normalizing the data and getting it into shape where you can start interpreting it. So basically I’ve had to develop all that from scratch.
The thing is, once I’ve done that, now I want to go and do some sort of higher-level, more biological research: What is the biological meaning of this data? So me and my collaborator will sit there and look at a lot of figures and I’ll generate a lot of figures on my own and so on. And there it’s a little bit of a challenge because I have to try to get my data into some tool that’s going to do what I want it to do.
For example, the commercial tools will support certain file formats — say for Affymetrix or Agilent. So you can just upload files and then they have certain pre-set types of figures that you can create that are really nice, but they’re not really flexible. So the other option is, I can use these other tools that are more special purpose, but then the chances of finding one that actually does what you want is not very high.
So it sounds like people are still picking the best tool for any particular job on a case-by-case basis.
That was one of the reasons we wrote the review. It was just that we ourselves were looking for a tool to do what we wanted, and it’s just the tip of the iceberg when you see a few of the more common ones like Cytoscape or whatever.
We thought, ‘We’ll look at all the tools that are out there and then get an idea of what the capabilities are,’ and we had no idea that we were going to end up with a minimum of 35. We thought at the beginning that we’ll do a really thorough review of the five tools that are out there, and suddenly I found myself downloading a lot of software for a while there. It took a couple of months to do that.
You list all these different tools, but the paper really focuses on a few of them — Pathway Studio, Cytoscape, Osprey, Patika, Visant, ProViz, and BiologicalNetworks. Is it safe to say that those were standouts in the field?
Mostly these were the ones that stood out from the others — [usually] because they were the first to have some sort of feature we were interested in. For example, there’s a tool [called] Osprey in there that was one of the first for looking at very large networks. So we were interested in that to see how they handled larger networks, because that’s still a challenge.
Each one of them is unique in some way — they support something probably better than anyone else.
I’m curious about the 25 features that you used to examine and compare these tools. Would you consider any of those features to be more important than the others from a user perspective?
It really depends on what you want to use the tool for. For us, we thought one of the features that was probably most important was the ability to create a plug-in. For a typical biologist, they don’t really care because they’re not going to write plug-ins. But if you’re doing the bioinformatics, then this allows you the flexibility to extend the tool without having to go and create your own.
And then another [one] was just how well the tools were able to handle some of the new data types. Some of the first tools that came out were just looking at networks. They were a little more than a tool for drawing a network, but there’s not a lot of biology involved. And then slowly, there are tools that have added subcellular localization, time series, protein compounds, and things like that. So we were particularly interested to see how [they] incorporate these other data types into a network layout in some kind of informative way.
You mention time series data, but it seems like very few of these tools are handling that right now. Was that surprising to you?
It’s hard to handle, so in that sense we weren’t surprised. My background is actually graph drawing, so I have a pretty good idea of how hard things are going to be to try to incorporate into a network layout. The simplest thing you can do is draw the network and then sort of fade nodes in and out, and that’s pretty typical of what people do. That’s the easiest thing. So basically you come up with one static drawing and the colors of the network will change over time.
The problem with that is that across time … if your network is really large, you’re just going to have this really ugly drawing with colors changing on it and you’ll have no way of knowing what’s the relationship from one node to another. So what you would like to do with a time series is … just draw the nodes that are really active or somehow important, and then when you go to the next time point, have some way of showing how you got from the previous time point to the current time point. And that’s difficult to do because now you have two drawings, basically, that you have to create and then you have to show how one gets from one to another.
There is a little bit of research into that. I know of at least one group that has been dealing with that, but not for biology — just for the problem in general of showing a network that changes over time.
Did you come across any surprises in this study? Were any of these tools better or worse than you were expecting in terms of any particular capability?
I think the biggest surprise was the number of tools. The other thing, and it might have been because I was somewhat new to the field when I started reading all these papers, was that I would read them and be very impressed and think, ‘Wow, this is the tool that’s going to change the world,’ and then I’d start using it and go, ‘Oh, OK, it’s not really that amazing.’ So it’s a little disappointing that way when you go from the biological jargon and you actually see what they did and it wasn’t that spectacular.
If you are coming in from the biology side, I think you can get impressed because you can see the nice pictures that someone comes up with. And some of them, like Pathway Studio, they really do come up with nice figures that you can insert into a paper. When I show nice pictures to biologists, they can’t say how happy they are.
But trying to please a computer scientist is a little bit more difficult — especially when you’re trying to do analysis. When it comes to that, most of these tools are pretty elementary. A tool, for example, will advertise that they support all kinds of network algorithms, and when you actually look at what the algorithms are, they’re things like the shortest path between two nodes, or finding the components of the network if the network is not connected, or is weakly connected in certain places. These are things that are very basic, very elementary. They certainly do tell you something interesting, but you can do so much more. So much mathematics has been done on networks that can be applied.
How would you recommend that developers of these tools address some of the drawbacks that you identified?
I think the fact that there are so many tools around, and they were each completely written on their own, it kind of makes me wonder if the resources are being used well. For example, I looked at 35 tools, and very few of those tools really use other software. And there has already been tons of software written for viewing networks and so on, but it seems like the authors were unaware of that, or they thought it would maybe be too much work to actually try to reuse something.
So the answer to that is tools like Cytoscape, I think, because, first, they’re open source so anybody can look at their code and see how they’re doing things. And then the other thing is that you can write plug-ins, and writing plug-ins seems to have become quite popular for Cytoscape, so this allows users to really customize the tool. Cytoscape was designed with that in mind — that they weren’t going to do something too spectacular, but they were basically going to try to make it as easy as possible for people to view networks and then customize it with plug-ins.
Is there a tradeoff in something like Cytoscape, which is geared toward bioinformaticists and software developers who need that flexibility? Is that something that a biologist would have a harder time using and might prefer one of the commercial packages?
Yes. In fact you see that right now. Ingenuity is one of the top commercial tools right now, and from what I can tell, the reason is that they have a very user-friendly web interface. You don’t have to install anything. You just sign up for your membership, and then you can upload your data, and get your pictures and do your analysis. They don’t do anything too spectacular, but the thing is that they really make it easy for the user.
And there’s no reason that somebody writing a Cytoscape plug-in can’t make it easy for people to use. I’ve used a few, and sometimes there are issues because the plug-in was written by somebody doing research who’s not really interested in users so much. They’re really more interested in showing that you can do a certain thing.
Actually, BiologicalNetworks is built right on top of Cytoscape, so there’s no reason that somebody couldn’t reuse Cytoscape to do something very user friendly.
One of the things [they could do is] remove some of the options. This is one thing that I found really kind of frustrating with some of the tools written by computer science people: there are just so many options you can choose, that unless you’re in computer science, you’re not really going to know what you’re supposed to do next. So a tool like Ingenuity [will] have wizards for doing things, so it will be very easy to ask simple questions. But a lot of labs aren’t going to have time to do that because they’re not really a software company — their first objective is biology or computer science or something.
The paper raises the issue of standards for the pathway informatics community, but it also notes that there are already several of these standards available in the field but not all these groups are adopting them. Have you seen any progress in this area?
I think the [Proteomics Standards Initiative’s Molecular Interactions] file format has been adopted by almost all tools, and I think that’s a success in this area because they use a leveled approach … and that is that you start with the lowest level, which is very simple, and then you have increased levels with increased complexity.
I think this is really important because it allows people to pick the level of detail that they want to handle in a tool and then support it. Because for certain tools it’s not important that they model every aspect of molecular interactions. Maybe they just want to look at pairs of proteins and that’s it, so they can support that file format, for example, and not worry about more complex things.
For me, one of the more interesting standards is the graphical notation, and that hasn’t really caught on. What’s caught on more is sort of an informal standard, which seems to have started with Pathway Studio, and now every time I see a new tool, I’m starting to see figures that look suspiciously very similar to what you get with Pathway Studio, so obviously that’s been successful and biologists like that. I’ve now seen at least three tools that look so similar that when I first saw it I thought it was Pathway Studio.
What are the key things that you’ll be looking out for in these tools going forward?
In the paper we talk about open-access software, and I think that’s pretty key. For us, it’s pretty frustrating to use most commercial packages because of the fact that they are very closed, so it’s very difficult for them to offer things like plug-ins or customization because everything is proprietary. So those tools are very user friendly, but difficult for us to use and to customize. So we look for open source because we know that we could play with it if we wanted to.