COLD SPRING HARBOR, NY — The annual Genome Informatics conference, held here Nov 1-5, touched upon a wide range of subjects, but some topics, such as regulatory analysis, genome browsers, and epigenomics, garnered a bit more attention among the more than 200 delegates in attendance.
Organized by Tim Hubbard of the UK's Wellcome Trust Sanger Institute, Jason Swedlow of the University of Dundee, and Michele Clamp of the Broad Institute of MIT and Harvard, the conference included sessions on regulation, pathways, and networks; pathogenic microbe genomics; assembly, annotation and resources; epigenomics; images, atlases, and reconstruction; and comparative and evolutionary genomics.
Focus on Regulatory Regions
One theme at the conference was new methods for analyzing regulatory regions of the genome.
Alison Meynert of the European Bioinformatics Institute described "Sunflower," a computational method of locating low-affinity binding sites for transcription factors, which are present in ultraconserved elements in mammalian genomes.
Meynert said that she and her colleagues have been concerned with how UCEs work. "Why are they so highly conserved?" she posed during her talk. Her research group in Hinxton hypothesized that UCEs act as “regulatory switches” that respond to changes in transcription factor concentration. They began their study by looking at overlapping binding sites to determine what the motifs are, she said.
Using Sunflower, the researchers were able to identify “a number of motifs” that are enriched in low-affinity sites, which suggests that enhancers “act as concentration-sensitive switches," the researchers said in the conference abstract.
Another talk, by Manolis Kellis of the Massachusetts Institute of Technology, discussed regulatory network inference using 12 Drosophila genomes. Kellis said that the study showed how animal genes are regulated both pre- and post-transcriptionally.
His group used whole-genome alignments from a dozen Drosophila species to discover regulatory targets in the organism’s genome. They then developed a phylogenetic framework to allow for evolutionary relationships between species.
Web-Based Analysis and Browsers
James Taylor, a visiting member at New York University's Courant Institute, discussed Galaxy, a web-based platform for large-scale genome analysis.
Taylor said that Galaxy allows experimental biologists to run complex analyses on "huge" data sets with "nothing more than a web browser, and without needing to worry about details of installing tools, allocating computing resources, and file format compatibility."
In a follow-up interview, BioInform asked Taylor to define “huge.”
He said, “There is no inherent limit to the size of datasets that a Galaxy instance can deal with; it is limited only by the storage and compute resources available. The public Galaxy service has no problem dealing with multi-gigabyte datasets.”
In addition, Galaxy is designed as a platform for bioinformatics developers to make their command-line software tools available to a broader user base via the system’s unified interface.
Taylor further described Galaxy as “a substrate on which you can build tools,” and stressed that while the platform offers access to a number of data sets and analytical tools, “we are not attempting to solve everything.”
The approach “saves time [by] providing existing computational tools [that can be] made accessible through Galaxy, but if a particular research group has their own tools they can ... [also use Galaxy as an] extensible framework,” he said.
Taylor said that use of the system is increasing. Between May and August, the number of jobs run on the system per day increased from just over 330 to around 600. During that time, the number of screen views per week increased from about 400 to more than 1,000.
Taylor began developing Galaxy while at Penn State University in collaboration with researchers at Brigham and Women's Hospital and Harvard Medical School. Anton Nekrutenko currently manages the project, which is hosted at the Center for Comparative Genomics and Bioinformatics at Penn State’s Huck Institute of the Life Sciences.
Galaxy is freely available here.
Other speakers, such as Zheng Zhang from Applied Biosystems, talked about the company’s newly launched SOLiD system, a sequencing platform for high throughput short sequencing reads, while CSHL’s Lincoln Stein discussed the current version of GBrowse, especially an upcoming version with rubber banding features due for spring release (see Expression Profile, this issue). By so doing, he said, the user would be able to drag chromosome views with a smooth, continuous motion while quickly collapsing tracks without delays caused by page reloading.
In addition to several talks during the sessions, the subject of browsers drew a standing-room only house at a "birds-of-a-feather" session Saturday afternoon in which developers from various industry segments debated some areas of contention.
Galaxy is “a substrate on which you can build tools.”
Top among their concerns was the issue of speed. Jim Kent of the University of California, Santa Cruz, said that speed is significant because even a two-second wait might redirect one's train of thought when performing key functions. “Daily transfer speeds are starting to bite,” Kent said.
Other areas for debate included the naming of exon and alternate splicing; interoperability; common data formats; pan-mammalian gene names [and how to classify them]; Wiki-type browsers; community annotation; and the semantic web.
Attendees of the session agreed that the top browsers in the bioinformatics field are currently GBrowse, Ensembl, and UCSC's browser. An update on some of the browsers spotlighted how UCSC is currently trying to upload all Ensembl specs and that Ensembl, like the future version of GBrowse, allows one to highlight “a little rectangle” of information, once again drawing focus on the topic of rubber banding, which came up often in conversation during networking events.
After the sessions, delegates conversed on how GBrowse 2.0 could be revelatory with its rubber banding and other Google-map-like features, with the future upgrade of GBrowse requiring a full revamp of existing features.
After Stein’s talk, many delegates were abuzz about such an upgrade, including one scientist who touted the new version’s ability to “zoom in” within a given window.
Epigenomics Tools Touted
Epigenomics was another key theme at the conference. Christoph Bock from the Max-Planck-Institut for Informatik in Saarbruken, Germany, discussed Epigraph, a software tool for analyzing multiple types of genomic data in order to predict epigenetic features.
In a conversation with BioInform, Bock explained that Epigraph has been used in various studies, including one designed to predict the disposition for bipolar disease and other psychological disorders. The tool is able to "assess whether this is likely to be an epigenetic effect or whether this is just a genetic effect," and whether a various region is prone to epigenetic silencing or not, which he said increases the risk of psychological distress.
Bock claimed other epigenetic research projects were successful. For instance, by integrating multiple epigenome predictions across tissue boundaries and from myriad cell types, his team found that a subset of CpG islands are characterized by a “ubiquitously open chromatin structure.” Based on this finding, the researchers developed a method for predicting CpG islands that improves upon “traditional CpG island definitions,” according to the paper abstract.
Epigraph relies on a large epigenome database to identify common genetic features, and it automatically calculates appropriate control sets and a range of potentially predictive genomic and epigenetic features. Based on such data, the software uses statistics to identify features that are significantly different between cases and controls, and performs more advanced machine-learning analyses.
Bock said that the tool can be particularly useful when working with histone modifications as well as when performing ChIP-on-chip analyses. The tool is accessible by contacting the Bock at [email protected]
Other epigenomics talks included Jim Kent of UCSC discussing the display of associations and improving alignments and the gene set at UCSC.
The Case for Openness
Michael Ashburner of the University of Cambridge genetics department was introduced as a proponent of “openness,” which the bioinformatics community has embraced. In his keynote, he talked about the need to develop a resource for community annotation, such as the kind provided by Wikipedia. But he cautioned that “it’s too early to tell if these community annotation [sites will be viable], if they are sustainable.”