Genome browsers bring the world of genomics to mainstream molecular biology. Without any easy way of accessing a sequenced genome, a new assembly is well beyond the reach of any non-computationally-oriented researcher. With a genome browser, however, biologists can go straight to their favorite genes and begin to integrate their understanding of expression, regulation, and conservation between species.
Besides choosing from several public browsers, you can choose to create a custom one using publicly available genome browser software. As frequent users of public genome browsers, we have some favorites, and we’ve also set up a couple of our own.
What’s out there
How do the current public genome browsers compare? American biologists often start with NCBI’s MapViewer, since it’s well linked to other popular NCBI databases. It’s fine for identifying genes in a genome neighborhood and getting genome sequence. The graphics, however, aren’t so easy on the eyes, and even for the human genome, the available annotation tracks are limited.
Away from the hegemony of the NCBI, EMBL’s European Bioinformatics Institute and the Sanger Institute created the popular Ensembl database with its ContigView browser. ContigView is a full-featured browser with many annotation tracks and the ability to export sequence, gene features, and PDF images. Directionality (strand) of some features, however, is not clear, and diagonal introns (the traditional representation) don’t come across well with bitmap graphics. Archived versions are available, so even after an Ensembl update, you can return to the same browser and data. Custom annotation tracks can be added by going to the obscurely named “DAS Sources” menu, which refers to a Distributed Annotation System.
Another choice is the UCSC Genome Browser, which is our favorite. This browser does basically what the Ensembl browser can do, but a little better and faster. First, it has a really big selection of annotation tracks, some of which (like the Conservation track, with data from 28-species alignments) cannot be found elsewhere. Tracks can be displayed in multiple levels of detail, and other aspects of the graphic configuration are quite nice. We like the use of diagonal marks in introns to show strandedness, along with a similar convention for many continuous features (like single-exon ESTs). Also, coding and UTR regions are differentiated by height, rather than color, of the gene. A very helpful tool connected to this genome browser is the sequence alignment tool BLAT, which can link a novel sequence to a genome position in seconds: much faster than Blast, albeit less sensitive. We also really like the ease with which one can “Add Custom Tracks,” where we can show our own data alongside our selection of public tracks. UCSC, like Ensembl, provides all its data for download, along with the browser software.
Many model organism databases choose to use the GBrowse genome browser package because of its ease of use and installation. GBrowse software is used for popular databases such as SGD (yeast), Flybase, Wormbase, MGI (mouse), and the human HapMap project. GBrowse has most of the features and power of the Ensembl and UCSC browsers, and it also has some extras like the Flip option, so you can orient your gene left-to-right even if it’s on the negative strand of the reference chromosome. It also provides the download of all annotation tracks and/or sequence as tab-delimited and fasta files.
If you’re studying a genome that’s not already browsable or simply want to add multiple large tracks of data to a well-studied genome, try setting up a custom browser. A GBrowse browser can be set up by a single person, although it does take awhile to get things working the first time. Right out of the box it runs with flat files, but to get the real thing, you’ll want to install your genome sequence and annotation tracks in a relational database like MySQL. Public GBrowse installations often let you have one big tab-delimited GFF file with all of their annotations, but creating yours can be tricky. Nevertheless, displaying custom data is probably the reason you’re setting up a genome browser, so creating GFF files will be a major part of that.
A whole other type of genome browser that many biologists never see is the annotation browser. On one hand, it’s quite similar to typical genome browsers in that it provides a graphical display of genes and other data aligned to the genome. On the other hand, in contrast to all above resources, these are tools for producing manual annotation. The user can work through the graphical interface to edit the annotations in the database, allowing experts to refine annotations. Two choices here are Apollo (from the GMOD people who created GBrowse) and Argo (from the Broad Institute). Both Apollo and Argo are usually run as Java applications, and they are easier than the above genome browsers to get started — a big advantage. The main disadvantage for some potential users is their design; they need lots of memory, so jumping between chromosomes in a vertebrate genome doesn’t work very well. These two tools have quite different interfaces and functionality, but since they’re so easy to set up, you can try both before deciding which you prefer. They also both have synteny viewer add-ons to compare two genomes.
Fran Lewitter, PhD, is director of bioinformatics and research computing at the Whitehead Institute for Biomedical Research. This column was written in collaboration with George Bell, PhD, a bioinformatics scientist in Fran’s group.