NEW YORK (GenomeWeb) – Researchers from the University of Utah and Boston College have released two applications on the iobio analysis platform — a web-based system they created that comprises open source data analysis tools — to help biological researchers visualize, explore, and analyze sections of genomic data in real time in an intuitive and interactive fashion.
The main iobio platform was developed by researchers in the laboratory of Gabor Marth, a professor of human genetics at U of Utah. Marth first told GenomeWeb about Iobio early this year. He said then that the system would provide applications that would allow users to analyze sections of data such as sequences relevant to specific genes of interest. These were applications that his lab was developing as part of the Genome Sequencing Informatics Tools (GS-IT) program, a National Human Genome Research Institute-funded initiative involving seven research centers and universities.
The two new applications are intended largely to help users check the quality of their data. One of these, called bam.iobio, is the primary subject of a correspondence piece that was published in Nature Methods last week. It's a tool that inspects the quality of the content in bam files by randomly sampling sections of sequence from files and generating alignment statistics based on these. Since it only looks at sections of sequence rather than the file in its entirety, bam.iobio is able to compute statistics such as average read coverage and its distribution, fragment-length average, base quality values, read duplication rates, and so on in less time and with less of computational burden than programs such as SamTools or BamTools. Users can change the default parameters — it selects 40-kilobase genomic windows at random start points along a chromosome — and the tool returns updated results in seconds.
The second application, called vcf.iobio, samples data in variant call files. For this particular app, the iobio development team has included files from the Exome Aggregation Consortuim and the 1000 Genomes project on the iobio site so that researchers can use them to put the app through its paces. The application site provides directions for prepping files to interact with both bam.iobio and vcf.iobio. Users have the option to access applications remotely on the team's webservers or they can download the source code and install bam.iobio on internal servers. A third option that could soon be available will allow users to run the application on their own Amazon cloud servers, the developers said.
An important feature of these apps is that unlike existing visualization tools that provide static views of genome data — such as the UCSC Genome Browser — they analyze data on the fly by pulling information from bam files and regenerating statistics as users move across sections of the genome or change input parameters, Marth told GenomeWeb last week. He sees these applications as complementary to existing browsers, offering researchers a first look at their data to determine its quality before proceeding with downstream analyses. Moreover, they return alignment statistics in visually intuitive graphical formats that are much easier to understand and parse than the textual output provided by some other tools, he said.
"Notably, sampling takes place where the bam file is stored (i.e., on cloud storage or a user's hard drive), and only the sampled data — a tiny fraction of the entire bam file — are ever transmitted," the researchers wrote in Nature Methods. "The alignments are then streamed to data analysis web services that produce appropriate alignment statistics in seconds before transmitting these to bam.iobio for visualization." For comparison, processing an 18-gigabyte bam file using standard software took eight hours, while bam.iobio computed the same statistics in less than 10 seconds, according to the paper.
Furthermore, "real-time visualization allows the user to experience how the statistical distributions progressively converge and become stable as sampled alignment data are collected," the researchers said. "The user can further explore the data interactively by selecting other chromosomes or chromosomal subregions, using the main read coverage panel for navigation."
Marth's team has also begun working on additional applications for iobio. They have an early version of a population clustering application that clusters data from the 1000 genomes project by variants while also showing the genomic region and population of origin. They've also provided an initial implementation of a variant caller comparer application, which lets users call variants using different tools and compare the variant calls they produce. In both tools users can run the default parameter or adjust them as they see fit. Also planned is an application called flow.iobio that will enable users to construct and run analysis workflows using tools available on the iobio platform, Marth said.
Most likely in February next year, the developers plan to release software libraries that will enable members of the community to develop and launch their own apps on the platform. However, third-party developers interested in creating apps for the system can work with the iobio team now to get their apps up and running. Marth said the group has already begun working on apps with a number of unnamed partners at other institutions, and that it is open to feedback from researchers who use their apps.