NEW YORK (GenomeWeb) – Researchers in the Genome Browser group of the University of California, Santa Cruz have released a new iteration of the UCSC Genome Browser that is designed to run on cloud infrastructure.
The UCSC group first mentioned its plans to release the so-called Genome Browser in the Cloud (GBiC) in an update paper published last year in Nucleic Acids Research. It is similar to the Genome Browser in a Box (GBiB) which was launched in 2014 to provide an onsite version of the broader web-based resource. The UCSC Genome Browser is used by more than 130,000 biomedical researchers in academia and biotechnology companies to visualize and explore publicly available genome sequence information from nearly 100 organisms including human, chimp, mouse, and fruit fly. Researchers use the browser to locate functional genes, identify sequences associated with disease, and compare sequences between individuals and across species.
"GBiC is a little between making a Genome Browser mirror site and GBiB," Ann Zweig, senior engineering and project manager for the UCSC Genome Browser, said in an interview. "It takes a little bit of command line work so it's not as simple as just double clicking on an application … but you can install it on an external cloud like Amazon or Microsoft or on your own … company's internal cloud system."
To use GBiC, interested users can download the virtual image from the Genome Browser store and install it in any UNIX-based cloud instance of their choice. One of the benefits of the GBiC is that it allows users share their datasets with other specified users, according to Zweig.
"Say I am working in a lab in Santa Cruz, with someone in a lab in Tel Aviv, and someone in a lab in New York and we want to collaborate but we don't want other people to see our data," Zweig said. "I can install GBiC and I can control who has access to it."
Users can do this to some extent with the web-based version of the browser by sharing the URLs associated with their datasets but GBiC takes that a step further by allowing users to set up their own instances of the browser that are accessible only to select partners, she said.
GBiC provides all of the same functionality but does not include all of the data available in the web-based version of the browser so users do need to download those datasets themselves from the main genome browser site, Maximilian Haeussler, a research scientist in UCSC's School of Engineering and a GBiC engineer, said in an interview. "Storage in the cloud is expensive so if you want to download our seven terabytes of data, you can easily store it on a hard disk," he said.
While GBiC is intended specifically for cloud-based installation, it also replaces the manual installation process for mirroring the web-based UCSC Genome Browser in multiple environments including cloud servers, dedicated servers, or laptops. It is much faster to install than creating a mirror site, according to the developers. "What historically took at least a week now typically is less than an hour," Genome Browser author and principal investigator Jim Kent said in a statement. It is also as secure as GBiB, the developer said.
GBiC and all other iterations of the UCSC Genome Browser are free for non-commercial users including non-profit organizations, academic institutions, and individuals. However, corporate users are required to purchase licenses. To use GBiC, they pay a one-time setup fee of $2,000 and subsequently pay an annual fee of $1,000 per instance. Instructions for installing and running the system are provided here. Although the solution is targeted towards people with some informatics experience, when "we tried GBiC with people who have no command line experience, they got it working just by copying and pasting the commands without any problem," Haeussler said. "It's not a big hurdle at all."
For comparison, to license the GBiB, corporate entities pay a one-time setup fee of $1,000 and then subsequently pay $1,100 per annum. There are now several hundred GBiB installations currently in use mostly in academic settings and some companies, according to Haeussler. Meanwhile, for commercial customers who want to purchase the UCSC Genome Browser source code, they pay a one-time setup fee of $6,000 and then pay an annual per-user fee of $1,000 per user for five to 19 users. Full details of system requirements for running each of these resources are also provided.
Moving forward, the developers plan to make it easier for users of the web-based version of the Genome Browser to set up hubs as well as to add their data to mirror browser sites, Haeussler said. Existing tools for researchers that use the main Genome Browser site include track hubs, which lets users host annotation files that they have created for an existing genome available on the Genome Browser on a separate webserver and then visualize those files in the browser. A separate tool called assembly hubs lets users visualize genomes not currently available in the browser.
"Both track and assembly hubs require that users have a webserver or at least a full cloud storage provider like Amazon S3 where they can put files," Haeussler said. "We currently cannot help a lot with that, but we hope that most users know how to put files onto the internet. They can always contact us if they can't, we can help with finding a suitable webspace provider."
The group also offers a custom tracks option that, as the name implies, allows third-party users to contribute tracks of interest that may have been excluded because the annotation track data is too specific to be of general interest or can't be shared until the associated paper has been published. However, "we're moving away from that, as it means that we have to store people's data, and people have too much data these days," Haeussler said.