Skip to main content

UC Santa Cruz Group Launches Genome Browser in a Box to Offer Local Data Exploration Option

Premium

NEW YORK (GenomeWeb) – Researchers at the Genomics Institute of the University of California, Santa Cruz have released a version of the UCSC Genome Browser that runs on laptops and desktops, providing a smaller, more manageable alternative to the online version of the visualization software for users who prefer to explore private or proprietary datasets in house.

The UCSC Genome Browser is a well-established resource used for accessing and visualizing publicly available genome sequence information. It provides access to a database that contains the genomes of about 100 species including human, mouse, and fruit fly. Researchers use the browser to locate functional genes, identify sequences associated with disease, and compare sequences between individuals and across species. It’s a popular site that gets about 1 million hits a day and serves about 150,000 users, mainly scientific researchers in academia and biotechnology companies.

The so-called Genome Browser in a Box (GBiB) provides a subset of the capabilities available in the more comprehensive resource while maintaining the same look and feel of the familiar web version. Essentially, it's a virtual machine image that simulates a server loaded with the complete browser and all the necessary dependencies needed to install and run the software on a laptop with minimum fuss, Jim Kent, developer of the UCSC Genome Browser and the director of the Genome Browser project at UC Santa Cruz, said in a statement. The system is optimized for use with the hg19 human genome assembly but users can use it to view other recent assemblies. For the hg19 assembly, the default tracks are UCSC Genes, Refseq Genes, publications on sequences and SNPs, blat-mapped RNAs and ESTs, layered H3K27Ac, DNAse clusters, transcription factor clusters, multiple vertebrate genome alignments, dbSNP, and RepeatMasker.

These capabilities were selected for the default GBiB installation based on the size as well as how often the specific resources were used, Kent, who is also a research scientist in the department of biomolecular engineering at UC Santa Cruz, told BioInform. Additional tracks and versions of the human assembly can be accessed somewhat more slowly by remote protocols, or they can be downloaded to the user's computer, if there is sufficient disk space, for faster access.

GBiB provides a "theoretical fix to the problem of searching large databases using sensitive data," Max Haeussler, a postdoctoral researcher at the Genomics Institute and the lead engineer for the GBiB project, said in a statement. It provides an alternative to keeping more sensitive datasets safely tucked away behind protective firewalls, where, as a result, they are separated from the large web-based databases that provide needed biological context, and at the same time allays the security and privacy concerns that dog the use of genomic information. It also is much easier to look at data using GBiB than it is with the online version of the system available on the UCSC website since the data is stored in "local files on your laptop that the browser has access to," Kent added.

To run GBiB, users' systems should support virtualization technology and should have Oracle's VirtualBox software installed. GBiB is much simpler to install then the full version of the Genome Browser. According to developers, it's possible to have the browser and dependencies installed and running in an hour. That's a much shorter time frame than what's required for downloading and installing the full version of the browser for local use, which takes a week of effort, terabytes of server space, and requires expert help to properly install things like the Apache web server and other bits of internally developed code that are needed to run the solution properly. With a disk footprint of less than 10 gigabytes, Genome Browser in a Box also needs much less space than its parent, Kent said — it requires more if a user opts to download additional tracks.

GBiB is free for academic and non-profit users, but corporate researchers do need to pay a licensing fee to download and use the solution. They have to pay a $1,000 one time set up fee and then after that, the cost per year is $1,100.

That's cheaper than the cost of a regular license to the Genome Browser code, which requires a $6,000 one-time set-up fee and $1,000 per year, per user for five to 19 users. Also, smaller organizations with fewer than 500 employees have to purchase a minimum of five seats. Those with more than 500 employees have to purchase at least 20 seats — there is a separate licensing agreement for this particular group.

Kent and his colleagues plan to develop at least two other versions of GBiB that could be used by larger groups of people, for example, the members of an academic department. They are considering two options in particular, he said, one of which will run on Amazon and potentially other cloud infrastructures as well; and a second system that can be installed and run locally on university servers. These systems will likely make all the data and tracks in the online version of Genome Browser locally available, he said.

Filed under

The Scan

Pfizer-BioNTech Seek Full Vaccine Approval

According to the New York Times, Pfizer and BioNTech are seeking full US Food and Drug Administration approval for their SARS-CoV-2 vaccine.

Viral Integration Study Critiqued

Science writes that a paper reporting that SARS-CoV-2 can occasionally integrate into the host genome is drawing criticism.

Giraffe Species Debate

The Scientist reports that a new analysis aiming to end the discussion of how many giraffe species there are has only continued it.

Science Papers Examine Factors Shaping SARS-CoV-2 Spread, Give Insight Into Bacterial Evolution

In Science this week: genomic analysis points to role of human behavior in SARS-CoV-2 spread, and more.