Skip to main content
Premium Trial:

Request an Annual Quote

Microsoft Research Launches First Version of Open Source Bioinformatics Toolkit


By Uduak Grace Thomas

Microsoft Research has officially thrown its hat in the bioinformatics ring with the launch of Microsoft Biology Foundation Version 1.0, an open source toolkit intended to familiarize life science researchers with the company's data-manipulation and data-management tools.

Simon Mercer, director of health and well-being, external research at Microsoft, described the toolkit to a room full of delegates at the International Conference on Intelligent Systems for Molecular Biology held in Boston last week.

Mercer told BioInform after his presentation that with the MBF, “we can build a platform layer of open source code so that the next project that comes along will find maybe 20 percent of its functions already built.” As a result, researchers will be able to focus less on informatics infrastructure and more on their research.

MBF is available under an open source license, and “executables, source code, demo applications, and documentation are freely downloadable,” Mercer said.

“It’s not a product and it’s not going to become a product” he noted. “It’s intended as a way to introduce the bioinformatics and the life sciences academic community to what can be done on a Microsoft platform.”

This first version of the MBF, which follows a beta release earlier this year (BI 03/26/2010), is a “language-neutral bioinformatics toolkit” made up of three main components. The first is a set of file parsers and file writers for common bioinformatics file formats. The second is a set of algorithms that can manipulate DNA, RNA, and protein sequences as well as multiple sequence alignments to find evolutionary differences between DNA sequences of different organisms. A set of web service connectors to sites such as the National Center for Biotechnology Information's Blast website complete the toolkit.

“We’ve architected the MBF library to be a plug-in model,” said Mercer. “We parse a bunch of file types out of the box but if you find that your favorite file type isn’t parsed, we’ve made it very easy for you to extend it by adding a parser.”

He continued, “If someone puts in the work to extend it and donates the work back to the project, we will re-release it as open source so the [toolkit] will grow.”

During his presentation, Mercer said that a team led by Microsoft Research's David Heckerman has adopted MBF for its own work. The researchers, who are using machine-learning-based approaches to design vaccines for HIV, are now building all of their algorithms and tools — which are available freely to the academic community — on top of the MBF platform.

“The next versions of all of those tools will rely on the MBF library, which means that his program spends less time doing the basics and more time actually doing the research,” Mercer said. “In return we get a lot of matrix manipulation and advanced math built into MBF and they are available to anyone and it's all open source.”

[ pagebreak ]

He noted that the release of MBF provides Microsoft with an opportunity to showcase its wide range of tools for manipulating and comparing large amounts of data that researchers in the life science space are unfamiliar with and aren’t taking advantage of.

For example, he said that two other Microsoft offerings — Seadragon and Deep Zoom — "enable you to navigate huge amounts of data very easily and to sort it to see relationships that you couldn’t otherwise see.”

He noted that both tools are freely available, and though they are not open source, developers have access to their libraries and APIs. “There is lots of genomic information and huge amounts of data that you need to manipulate and compare, so why are we not using these technologies in the genomics space?” he said.

In addition to the MBF library of tools, the company also offers an optional tool called ShoRuntime that provides high-performance statistics and math capabilities.

Mercer also noted that in creating the current version of MBF, Microsoft was “selective” about the tools included in the suite. As such, the toolkit does not have “a huge amount of functionality.” However he said there are plans in place to include more options to the next version of the toolkit, which will be released sometime next year.

“In version two we will be adding some extra features and connecting to a range of new technologies to make it more powerful and flexible,” he said.

Mercer further stated that the additional features will be based in part on feedback from MBF 1.0 users.

He said that the project has a technical advisory board that currently consists of three commercial groups and three academic groups, so future incarnations of the toolkit will be guided “by a group of people who [are] using the tools right now.”

Members of the board include researchers from Illumina, Johnson & Johnson Pharmaceutical Research and Development, Aditi Technologies, Cornell University, Queensland University of Technology, and the University of Texas, Austin.

Users can also make suggestions for improvements or additions to MBF, which Mercer says Microsoft encourages.

“We only do user-led development. If we can’t find a user who will use a feature, we don’t implement it so we don’t build things that nobody uses,” he said. “If someone in the community wants a given feature and they commit to using it, either we or they will add it depending on the resources we have to support the needs of the community.”

The Scan

Self-Reported Hearing Loss in Older Adults Begins Very Early in Life, Study Says

A JAMA Otolaryngology — Head & Neck Surgery study says polygenic risk scores associated with hearing loss in older adults is also associated with hearing decline in younger groups.

Genome-Wide Analysis Sheds Light on Genetics of ADHD

A genome-wide association study meta-analysis of attention-deficit hyperactivity disorder appearing in Nature Genetics links 76 genes to risk of having the disorder.

MicroRNA Cotargeting Linked to Lupus

A mouse-based study appearing in BMC Biology implicates two microRNAs with overlapping target sites in lupus.

Enzyme Involved in Lipid Metabolism Linked to Mutational Signatures

In Nature Genetics, a Wellcome Sanger Institute-led team found that APOBEC1 may contribute to the development of the SBS2 and SBS13 mutational signatures in the small intestine.