Bioimage informatics — a branch of bioinformatics focused on "large-scale bioimage generation, visualization, analysis, and management" — has emerged as a key sub-discipline in the field, according to a recent editorial in Bioinformatics.
The article, penned by several of the journal's editors and Hanchuan Peng, a senior computer scientist at Howard Hughes Medical Institute's Janelia Farm Research Campus, outlined the journal's decision to add a new paper submission category for articles covering the discipline.
The journal created the new category to meet a "noticeable need to publish high-quality papers on bioimage informatics," the authors wrote.
They noted that "growing amounts of bioimage data are imposing additional demands on how to store, manage, and retrieve [these] datasets," and furthermore that "joint analysis of image data in combination with other biological datasets, such as genomes and gene expression profiles, is also becoming more and more commonplace."
According to the journal's website, it will now accept papers that cover "informatics methods for the acquisition, analysis, mining, and visualization of images produced by modern microscopy, with an emphasis on the application of novel computing techniques to solve challenging and significant biological and medical problems at the molecular, sub-cellular, cellular, and super-cellular levels."
The journal will also accept "informatics methods/applications/software [and] various enabling techniques ... for such large-scale studies, and joint analysis of multiple heterogeneous datasets that include images as a component," as well as "bioimage related ontology and databases studies, image-oriented large-scale machine learning, data mining, and other analytics techniques."
This week, Bioinform spoke with Peng, who focuses on bioimage data mining and informatics at Janelia Farm, about this emerging field. What follows is an edited version of the conversation.
Could you provide some background about how you got into bioimage informatics?
I did some early work on automated analysis of gene expression patterns using microscopic images of fruit fly and C. elegans, and now work on several neuroscience problems — especially on reconstructions of three-dimensional digital brain atlases and connectomes of model animals including C. elegans, fruit fly, and mice. To construct these models we need to generate and take advantage of very large-scale image data sets based on 3D microscopy. We need to analyze the image data in a very high-content and high-throughput way. This type of high-content phenotype analysis is parallel to what people have been doing for other high-throughput computational biology research such as high-throughput sequence analysis. We actually do high-throughput, high-content image analysis to map brains and try to find associations between neurons and animal behaviors.
This type of technique is not only useful for neuroscience research, but also useful for developmental biology, cell biology, and molecular biology for many different problems because phenotype screening based on microscopy is becoming one of the must-have tools for biological studies.
Bioimage informatics has been a field that many people have been involved in for about 15-20 years. About seven, eight years ago, a lot of more researchers began to realize the need of general imaging informatics tools for a variety of problems. Together with a bunch of other colleagues, we started to organize an annual workshop on bioimage informatics. It attracted a lot of attention in the field. Different journals and conferences like [the Intelligent Systems for Molecular Biology conference] have started paper track[s] [on] image analysis, microscopy and visualization, and so on. Journals like Bioinformatics and BMC Bioinformatics both have started paper submission tracks on this topic.
You just mentioned that bioimage informatics practitioners have been around for about 20 years even though it's just now emerging as a distinct discipline. Where have articles and studies in this field been published historically?
The field started from two major places. The first is from various microscopy methods. Once people started to have automated microscopes, about 20 years ago, they had the ability to generate a lot of images. There was a need to do image analysis. The other is from more traditional biomedical image analysis. People tried to borrow ideas from the more established biomedical image analysis field, for instance MRI or CT image analysis, and use them to process various microscopy images. Both sources converge to what we have right now.
There are a lot of interesting papers in microscopy journals or IEEE Transactions on Medical Imaging. I contributed an early review paper for bioimage informatics but there are also quite a few other interesting review articles in different journals like Ilya Goldberg [of the National Institute on Aging's] review of pattern recognition techniques for bioimage informatics in PLoS Computational Biology, Jason Swedlow [of the University of Dundee's] review [in Annual Reviews in BioPhysics], and Gaudenz Danuser [of Harvard Medical School's] review on computer vision for cell biology [in Cell], to name a few.
What are some of the challenges associated with trying to adapt techniques from these other fields to work for bioimage analysis?
Two major issues: scale and complexity of the problem.
In more traditional biomedical informatics, the size or volume of the images are not as large as those you can get from current microscopy methods. Right now, typical microscopic images can have gigapixels, with XYZ dimensions like 1,024 pixels x 1,024 pixels x 1,024 pixels. Normally biomedical image analysis uses smaller images. In addition, with automated microscopy it is very easy for a biologist to generate many of these images. You could have tens of thousands of three-dimensional images tagged with multiple colors and each [image is] gigabytes in size. Therefore when you try to apply similar methods borrowed from biomedical image analysis to bioimage analysis, you need to think about how to scale up the particular method to make them applicable to the problem.
The large numbers of both bioimages and the “objects,” such as cells, in these images also lead to great complexity. This actually imposes additional difficulties [when applying] the more established methods for smaller-scale problems to these current very large-scale applications.
What are some trends that you are seeing in this space?
This field is application- and data-driven. With larger and larger data sets of individual cells or other biological objects, for instance in living specimens, how to observe the objects in real time, analyze them in real time, and mine the associations effectively are all becoming more and more important.
One trend is [establishing] some high-throughput, high-content analysis pipelines for very good applications. There is a need of general tools. For example, for developmental biology, people are using selective plane illumination microscopy to generate a huge amount of image data quickly. [Aligning these images] effectively becomes very important [for handling] these data. Thus, general image alignment and stitching tools are important. On the other hand, these alignment tools could be reused for structural biology studies based on high-speed electron microscopy, or neuroscience studies that compare different brains.
In this direction, people have started to develop several platform software packages so that other people will be able to write their own modules and plugins based on these platforms, and combine some of the functions and solve the problems in a much easier way. Interesting software packages include ImageJ/Fiji, Vaa3D, to name a few. Large, and centralized, bioimage databases are also [available]. Interesting examples include [the University of California, Santa Barbara's] BISQUE [Bio-Image Semantic Query User Environment], [UC San Diego's] CCDB [Cell-Centered Database], the Allen Brain Atlas, et cetera.
[Other] very interesting growth areas are how to store a massive amount of bioimage data and their metadata and how to retrieve these data, and how to visualize these multi-dimensional image data or metadata. There is also a need to use bioimage analysis or other informatics methods to help better imaging; i.e. acquisition of the data. These are different uses of the central bioimage informatics techniques and they indicate different directions for the development of this field.
You've touched on several applications of bioimage informatics throughout the conversation such as in neuroscience but could you highlight a few other specific examples?
In developmental biology, for example — the embryo development of fruit flies, zebrafish or mice — people have started to acquire high-resolution images of the dividing cells in toto.
There are several interesting pieces of work, such as the Berkeley Drosophila Transcription Network Project for the early development of Drosophila embryos. There are about 6,000 nuclei in an early fruit fly embryo and people need to extract individual nuclei and then track them over time to figure out where they move to under the influence of gene expression of different transcription factors. That can be a very interesting thing to try to model a transcriptional network.
Another example is [Janelia Farm's] Philipp Keller and colleagues’ work on developing zebrafish embryos. He also needs to extract individual cells from the zebrafish embryo and then track them in time. Many times, people also need to model cell division events over time based on bioimage informatics.
In another example of structural biology, people need to reconstruct the 3D structure of protein complexes based on image data from electron tomography. If one wants to map many protein complexes automatically, a relatively large-scale bioimage informatics pipeline needs to be formed to process the image data.
You can easily find similar examples for almost every sub-area of biology.