TORONTO — Broadening its scope and strengthening the ties between computational scientists and experimentalists, this year’s Intelligent Systems for Molecular Biology Conference revealed the intensely collaborative nature of bioinformatics as 1,400 attendees at the main meeting and 300 attendees in pre-conference satellite sessions convened in Metro Toronto Convention Centre this week.
Delegates attended a wide range of sessions covering second-generation sequence analysis challenges, open source software, and new analytical methods for studying protein structure prediction, gene regulatory networks, metagenomics, gene-disease relationships, imaging, pathway construction, and text mining.
Columbia University computational biologist Burkhard Rost, chair of the conference and president of the International Society of Computational Biology, noted that the meeting’s broad scope is representative of bioinformatics itself, which “has width and breadth beyond other fields and keeps changing from year to year.”
Rost told BioInform that over its 16-year history, ISMB has “completely changed” from a meeting almost exclusively for computer scientists to one representing experimental biology, as well. He noted that many scientists now run both wet labs and computational teams in addition to collaborating across disciplines.
Biologists are losing their computer-phobia, said Eugene Myers, a Howard Hughes Medical Institute investigator at Janelia Farm, as they draw increasingly on computational tools to “get at biology” for their discoveries.
“Biology is the motivator,” he said, noting that at ISMB researchers exchange both their knowledge about biology as well as the computational methods to discover new biological findings.
Where New Tools ‘Bubble Up’
Jill Mesirov, chief informatics officer at the Broad Institute and conference co-chair, noted that ISMB caters to a broad range of bioinformatics sub-communities in order to focus on the field’s rapidly evolving disciplines while maintaining continuity in subject breadth.
“There are as many faces of computational biology as there are faces in biology and biomedical research,” she said.
Although not a trade show, the event holds commercial importance, said Mesirov. “This is where things bubble up and begin to happen and it’s a great opportunity for people from industry … to come and hear what are the new frontiers in this research, what are the new data,” because eventually this science moves into the world of applications.
Some of these “new frontiers” included an expanded focus on genotype-phenotype analysis, second-generation sequencing, and image analysis, as well as discussions of cancer and other disease areas.
Michal Linial, director of the Sudarsky Center for Computational Biology at Hebrew University in Jerusalem and conference co-chair, said that the ISMB organizers felt it was time to schedule special sessions connecting new computational technologies and disease. In addition, she cited the field of image analysis as “hot and just evolving.”
To coordinate the special sessions, the conference organizers pre-selected some themes and “let the community tell us what we missed,” she said. The topic of viral-host communication, which was included in this year’s meeting, was one such proposal. Topics maturing for next year’s schedule include metabolomics, epigenetics, and computational proteomic analysis in biomarkers, she said.
“Association studies are big,” said Myers. “The genotype-phenotype connection is going to be a huge agenda for clinical/scientific applications for a long time to come. It has such a tremendous bottom line to help the human condition that’s a no-brainer.”
Thomas Hudson, president and scientific director of the Ontario Institute of Cancer Research in Toronto, agreed. “From a dearth of validated genes that cause common diseases, we now have hundreds of loci,” he said.
Hudson said that new computational and genomics tools are revealing that cancer is a heterogeneous disease and helping researchers move away from “simplistic, reductionist” approaches. “A magical drug that will affect all classes of tumors does not exist, and we have to be open to that heterogeneity,” Hudson said, noting that next-generation sequencing technologies can obtain deep sequencing data for many more patients that should help drive these discoveries even further.
However, there are many computational challenges associated with second-generation sequencing. For example, Hudson noted that the algorithms used to filter sequencing data are still error-prone. “We could lose mutations if we start thinking they are just sequencing errors,” he said.
“I haven’t seen the algorithms implemented in large labs, and they still have to be incorporated into the analysis pipelines.” Then the technology needs to be linked to biology where “biological validation tools are the bottleneck,” he said.
Bioinformaticists “may be the people who collaborate best with others. It’s not in the agenda, not written, but it is underlying sentiment.”
There are also cost issues associated with the informatics resources required for next-gen sequencing, Hudson said. “I don’t know what to say about what small labs should expect because the big labs are struggling with the shift. For every dollar I put in genomics I have to put a dollar in informatics,” he said.
Myers noted that second generation sequencers are “useful for digital expression and for SNPs, which is a compelling reason to buy them. However, he added, “we need longer reads.”
Even though there are a number of new algorithms for assembling very small reads, it remains a challenge because the sequencing redundancy required to achieve a good assembly ”affects your price point,” Myers said.
For now, he said, second-generation sequencers don’t present completely novel computational problems, but rather “issues of scale and size,” he said. With increased ambiguity comes the need to “wrestle with that harder.”
Myers said that although he still enjoys the challenges of sequence analysis, he “was cruising around for the next thing” a few years ago and discovered imaging when he got the chance to look in a microscope for the first time. “Holy cow, you can see that?” he recounted as his first words when he viewed cell division for the first time and realized “I can watch what that program, the genome, is doing.”
Myers has been developing methods to capture those goings-on quantitatively. Understanding linkages in networks requires grasping subtleties. “If a chain of dependences is long then a break early in the chain might only result in [a] 15-percent change in phenotype somewhere else, so you want to have quantitative, not just qualitative phenotypes.”
Much of this quantitative imaging work, which requires experimental methods such as tagging proteins with fluorophores, only became possible after the genome was sequenced, he said. Yet classic problems in image analysis such as registering, segmenting, characterizing, and annotating objects have not yet been solved, he said.
Myers cited other researchers, such as Carnegie Mellon’s Robert Murphy, who spoke at this year’s ISMB as part of the first wave of computational image analysis scientists. “You can see it’s growing,” he said.
Keynotes Point Toward Future Tools
The ISMB keynote speeches made it clear that biology will continue to deliver more computational challenges for the bioinformatics community to solve, while highlighting ways that current methods are already aiding biological discovery.
The Broad Institute’s Aviv Regev, whose work includes creating mathematical models for gene expression networks in different cell types, remarked that the keynote sessions were designed to get bioinformatics researchers thinking about solutions.
“The earlier on the computational scientists become involved, the better all around,” she said. Regev said that ISMB has been an important collaborative catalyst for her projects, where research has been devised, propelled, and completed from one meeting to the next.
“We see the huge presentations of data, and a year from now we will see the fruits of it,” Linial said.
Keynote highlights included a presentation by Hebrew University’s Hanah Margalit, who spoke about the rich repertoire of genomic regulatory patterns revealed through computational methods.
Claire Fraser-Liggett, director of the Institute of Genome Sciences and professor of medicine at the University of Maryland School of Medicine, spoke about the current state of metagenomics, highlighting that it is now possible to extend experimental and computational genomics advances to the study of microbial communities. Mainly microbes readily grown in the laboratory have been sequenced, giving a “skewed view of the natural microbial world” she said, since there is complex interdependence and interaction such as lateral gene transfer between these communities.
Shoshana Wodak, scientific director of computational biology at Toronto’s Hospital for Sick Children, highlighted the need to collaborate “hand-in-hand” with clinicians. We are getting to the point “where we speak the same language,” Wodak said in her keynote address.
Meanwhile, ISCB’s vice president Reinhard Schneider, a bioinformatician at the European Molecular Biology Laboratory, added that the door between bioinformatics and chemistry is opening more widely through initiatives such as PubChem, which has enabled chemical data to be integrated with drug-related computational studies.
During the coffee breaks, people clustered in front of the ISMB job board, which included a mix of about 20 industry and academic positions and fellowships. PhD students who spoke to BioInform described their excitement about the dynamics of the field, but confessed some bewilderment about its swift changes and emerging subcommunities and wondered how best to forecast where the jobs might be in a few years.
According to the field’s current leaders, that world will be shaped by cooperation. Bioinformaticists “may be the people who collaborate best with others,” said Linial. “It’s not in the agenda, not written, but it is underlying sentiment,”
Younger computational biologists are learning not to be “a service person, but to be part of the development, of great discovery,” she added.
Bridging scientific and medical disciplines will be essential for large-scale projects like the International Cancer Genome Consortium to succeed, OICR’s Hudson said. “The computational problems are huge but an informatics person has to work with many colleagues,” he said.
For example, “in early detection that would be the pathologists, radiologists, chemists who change biomarkers into imaging probes.” To obtain better results there, “we really need integration of teams. For example, the pathologist needs to understand that maybe the informatics person can do a better job with the image analysis,” Hudson said.
One example of an emerging collaborative fabric among younger scientists in bioinformatics is a group effort by researchers Rune Linding from the Institute of Cancer Research in London and Lars Juhl Jensen at EMBL who both met at EMBL and began collaborating there, and soon pulled in other partners such as Martin Miller and Søren Brunak of the Technical University of Denmark, Tony Pawson at Mount Sinai Hospital, Michael Yaffe at MIT’s Center for Cancer Research, and Peer Bork at EMBL.
At ISMB they presented a collaborative computational framework for predicting protein phosphorylation sites that is based on an algorithm called NetworKIN.
NetworKIN models the context of kinases and substrates and combines them with consensus sequence motifs to result in 2.5-fold improvement in accuracy over other protein phosphylation prediction methods, they said. Linding and Jensen also presented a database of predicted kinase-substrate relations.
Jensen said that the project benefits from the contribution of several different tools. “In the first version of NetworKIN we based it on merging a set of predictors from [the Technical University of Denmark’s] NetphosK with a set of predictors from [MIT’s] Scansite and then bundled that together with the protein interaction database in the STRING database, which I am one of the main developers of at EMBL,” Jensen said.
Three tools were combined in order to improve the ability to predict which candidate kinase phosphorylates a given site. “Basically we saw that you could go from 20 percent accuracy to on the order of 50 percent accuracy,” he said. “Normally in this business when people make methods the improvement is five percent more than the previous method,” he added.
Linding is currently setting up a new center in London and Jensen is starting a lab at the University of Copenhagen. “We are looking to put a lot more people on this project,” Jensen said. “And we are going to have close collaborations on this.”
But while collaboration is a given for Jensen and his colleagues, “Reproducibility is really an important issue for us,” he said.
“Sometimes you download data from a group’s webpage and it is different from the data they published or from the data they uploaded to a database. That is trouble,” he said.
Jensen’s concerns were echoed by others at the conference. HHMI’s Myers said that unlike biological discoveries, computational techniques sometimes cannot easily be published at the level of scientific reproducibility.
Likening bioinformatics tools to the metal detectors one might use to find gold doubloons on the beach, he said, “You get to report on the doubloon in a major journal.” However, for “the metal detector, there is no real outlet [to describe it], but it’s important, wasn’t it?”
The Broad Institute’s Regev echoed the sentiment that the focus in publishing is often on the biological discovery more than the method. Published science “does not leave enough room and reviewing capacity to actually get the methods described and reviewed to the standard that is needed so that they are not only reproducible but they can be built upon and improved and inspire the next step, which is what makes science work,” she said.
“We need to be able to look at each other’s results and methods — that includes algorithms and software — to do science well,” said Mesirov.
David Rocke, professor of biostatistics at UC Davis School of Medicine, agreed. “You can’t publish a paper in a good medical journal without providing details of the biostatistics used in the study,” nor can you get a grant from NIH without disclosing methods.
However, “we don’t have those rules in bioinformatics and what that leads to is irreproducible research,” he said.
Rocke recommended that authors offer supplementary materials or links to that material on other websites. “It should be possible and perhaps be required to publish the scripts that are used to do the analysis. The programs should be accessible in some form, so that others can reproduce the analysis,” he said.
Why doesn’t that currently happen? “I think it is largely, to be pejorative, laziness by the authors, because once you have the paper accepted it is actually a lot of trouble to go through and provide the exact specifications of what you did so that people can reproduce it.”
As part of an attempt to encourage reproducibility, the ISCB announced a new software sharing policy during the meeting that recommends that scientists share research results, which can include software and algorithms (see related feature, this issue for more details).
This policy is a “good step in the right direction,” Sean Eddy, computational biologist and Howard Hughes Medical Investigator at Janelia Farm, told BioInform via e-mail.
ISCB president Rost said he is pleased with the direction the society and the conference itself is taking. “ISMB moved from being a meeting about computer science with no relation to biology whatsoever to one where people like me are the regulars and where you have to show that what you do has some implication for experimentalists,” he said.
Next year’s ISMB will be held in Stockholm June 27 – July 2.