Although visualization techniques for biological data are no longer limited to Excel histograms and bar charts, there is still plenty of room for improvement, according to attendees of a recent conference on biological visualization.
At last week's second-annual Workshop on Visualizing Biological Data, or VIZBI, participants stressed the demand for methods that represent increasingly complex relationships in the face of growing datasets and more detailed information about biological interactions.
The workshop, hosted by the Broad Institute and held in Cambridge, Mass., covered a broad swathe of visualization challenges along multiple biological scales — from single genomes to population-level data.
During his opening remarks, Broad Institute Director Eric Lander noted that out of all the scientific disciplines, biology has the "messiest data," which quite often doesn't "fit" current visualization paradigms.
Even for a single genome, he said, there are different SNPs, insertions and deletions, rearrangements, as well as different correlations between these genomic variations that need to be represented in a way that makes sense to researchers.
Beyond that there is relevant information that needs to be layered on top of genomic data, such as information on chromatin maps and functional data, and all these data types and their interactions with one another need to be visually represented.
It is precisely because of this “multifaceted” and “interconnected” nature of biology that the visualization community needs to come up with better methods to represent data and address recurring issues of usability and interoperability, Sean O'Donoghue, a structural and computational biologist at the European Molecular Biology Lab, told BioInform during the conference.
“[Bioinformatics] tools all need to talk to each other,” said O'Donoghue, who was also a VIZBI organizer. He noted that in order for that to happen, "the people who make them have to start talking to each other."
He added that VIZBI was organized in order to facilitate that process and provide "a forum for the developers of the widely used tools in biology and tool users to come together.”
Nils Gehlenborg, a research associate in the Center for Biomedical Informatics at Harvard Medical School, added that growing datasets are also making it more complicated to build the requisite compute infrastructure for useful visualization tools.
"When you think about visualization, you can't think just about how to visually represent the data," said Gehlenborg , also a VIZBI organizer. "For any practical application, what most people spend a lot of time on is making the data accessible and [creating] this backend [on which they can] run a visualization system — in particular if you want to build software that can be used by more than just one person."
One presentation demonstrated some of the shortcomings of visualization for alternative splicing data. Yoseph Barash, a senior research fellow at the University of Toronto, noted that splicing graphs, which are one way to visualize alternative splicing data, can show connections between exons but leave out information on RNA isoforms, for instance.
He pointed out that one of the problems with current methods — in the area of alternative splicing as well as more generally — is that data is represented as strings, which form "the base of ... the visualization and everything is built on top of it or as flat files that introduce attributes with relation to it."
Barash argued that a better approach would be to have a "relational paradigm" that represents different entities, such as genes or exons, and the "relations" between them.
John Quackenbush, a professor of biostatistics and computational biology at the Dana-Farber Cancer Institute, noted that one issue that may hamper the adoption of new approaches for visualization is that scientists have a tendency to stick to what they are used to.
In many cases, "standard" tools and metaphors for representing data may not be the best options and could actually "hide a lot of the real features of the data," he said.
In his presentation, Quackenbush described the MultiExperiment Viewer, a Java-based application that analyzes, visualizes, and mines large-scale genomic data.
He noted that although expression analysis "has evolved over time in terms of the quantities of data that we've generated," visualization tools "haven’t advanced all that much."
In fact, heat maps are still the "fundamental tool" for representing RNA-expression data, although they have been available for several years," he said. "I think there is great opportunity within this community to try and take that data further."
Other genomic subdisciplines face similar issues. For example, Bradley Bernstein, an assistant professor of pathology at Harvard Medical School, noted that epigenomic data is just as useful for research as genome sequence data, but that the field has “largely failed” to develop interfaces and tools that would enable researchers in the larger community to access and use its information.
VIZBI was created with the aim of addressing some of these issues, said O'Donoghue.
About 184 participants attended last week's meeting, up from about 130 participants at last year's, which was held at the European Molecular Biology Laboratory in Heidelberg, Germany.
Videos of this year's VIZBI presentations and speaker's slides will be available on the conference website soon.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.