NEW YORK (GenomeWeb) – A research team from the University of California, San Diego has developed an open source library of tools called the Genomic Interaction Visualization Engine (GIVE) which lets users generate and implement personal browsers for visualizing genomic information without requiring specialized knowledge or expert help.
Full details of GIVE are provided in a recent paper published in Genome Biology. According to Sheng Zhong, a UCSD professor of bioengineering and one of the authors on the paper, the idea behind its development was to create a lightweight visualization tool that works essentially like Google maps.
Zhong pointed to telecommunications advances that have resulted in efficient smartphones able to do much of the work historically done by computers as part of his team''s inspiration for developing GIVE. Their hope, he said, was to offer an alternative to current genome browsers, some of which can require access to significant compute infrastructure and specialized expertise to implement and run. GIVE browsers ideally would be small enough to run on smartphones and be shared as simple email attachments.
In recent years, there have been efforts to make genome browsers more accessible to users. For example, in 2014, researchers at the University of California, Santa Cruz released a version of the UCSC Genome Browser for laptops and desktops to provide a smaller and more manageable alternative to the online version of the visualization software. The group also released a version of the software that is designed to run on cloud infrastructure offering yet another option for users who may not have sufficient hardware in house to host the solution.
There are also solutions that purport to provide tools to researchers building their own genome browsers. One example is myGenomeBrowser, which was developed by researchers at the University of Toulouse. According to a paper describing the resource that was published last year in Bioinformatics, it offers a web-based environment for researchers to build, query, and share their genome browsers with little expert help.
However, there are differences between the two solutions, according to Zhong. For example, GIVE can display genome interaction data whereas myGenomeBrowser cannot, he said. Also "GIVE can provide data visualization to a large collection of datasets including all ENCODE datasets without any downloading, installation, or programming," he said. In contrast, myGenomeBrowser users would need to download and install the browser and configure the datasets. Moreover, "GIVE allows for sharing of data to collaborators by sending a small file — several kB in size —and the collaborator can visualize the data with any web browser [and] without installing any program," he added.
GIVE is composed of a tag library, which provides the HTML tags used for generating browsers for visualizing different kinds of genomic data including single-cell transcriptome data, epigenomic data, RNA-chromatin interaction data and more. It also includes the GIVE-Toolbox, which provides command line instructions for performing necessary database operations and the GIVE data hub which is a web page for browsing metadata from genome datasets hosted on public data servers including the data type, description, and web address of the datasets.
Embedded in the GIVE Data hub is the so-called HTML Universal Generator (HUG) which is designed to generate HTML code for visualizing genomic regions. It is one of the ways researchers can implement GIVE browsers on their websites and is designed specifically for viewing subsets of larger consortium datasets on personal webpages, Zhong explained. Researchers interested in visualizing public datasets that already have metadata in the GIVE data hub can simply add the HUG -generated code associated with the genomic regions of interest to their websites. For other datasets, instead of using HUG-generated tags, researchers can generate and implement GIVE browsers by adding a few lines of code provided by the developers into their webpages.
At the core of the GIVE technology are two new data structures – Oak and Pine – and an algorithm for managing memory, called Withering, that Zhong et al developed to better manage the amount of data stored and managed on users' devices. Oak, is designed to handle sparse data tracks stored in the BED format such as genome annotations, gene tracks, peak tracks, and interaction regions, the developers wrote. For its part, Pine handles dense data tracks in the bigWig format. These tools formalize three concepts that are central to how GIVE functions. Specifically, the browser only transfers data in the genomic region that the user wants to see, reuses data as much as possible to minimize repeated data transfer, and transfers data at the necessary resolution based on the user's needs.
Zhong provided an example of a user who creates a browser for a large segment of the genome but then spends time studying much smaller fractions of that large segment. The initial dataset is not immediately deleted. However, if after a certain number of interactions the user does not zoom back out to view the much larger dataset, only then is it deleted. In this case, "[if] the user previously started with chromosome 1 but then their next five actions were related to chromosome 5 and [they] never went back to chromosome 1, after some time the chromosome 1 data gets erased," he explained.
The researchers also claim to have created a new method of visualizing data that they have implemented in GIVE. Rather than presenting the genome in a linear fashion as some browsers do, GIVE browsers display genomic coordinates as parallel lines with the interactions between regions displayed as links between the top and bottom layers. The intensity of the interactions between regions are displayed using a color scale.
The ability to combine interaction data with genomic information is crucial for researchers with the advent of large-scale consortia that are looking at various kinds of interaction data such as interactions between proteins and DNA or RNA-RNA interactions, Zhong said. Existing methods for visualizing interaction data include Circos plots and 2D matrices and now GIVE's double layer approach, the benefits of which include being able to simultaneously display different genomic regions and visualize long-range interactions. For example, the top layer can be a section of chromosome 1 while the bottom layer can be a portion of the X chromosome or any other desired combination, Zhong said.
Furthermore, the two layers don't have to be at the same resolution. The upper layer could be focused on a specific gene locus while the bottom layer could be a full chromosome. Another benefit of this way of visualizing the genome is that interactions between RNA and DNA, for example, are more intuitive, the researchers wrote in Genome Biology.
For their next steps, the researchers plan to provide personal password protections for GIVE users, Zhong said. They also plan to help browser users designate their personal datasets as either private or public, he said.