Skip to main content
Premium Trial:

Request an Annual Quote

Microsoft Transfers Open Source Bioinformatics Suite to Nonprofit in Bid to Boost Participation


By Uduak Grace Thomas

Microsoft Research said this week that it has handed the reins of its open source bioinformatics toolkit to the Outercurve Foundation, a nonprofit group charged with encouraging corporate participation in open source software projects.

As part of the transfer, the name of the toolkit has been changed from the Microsoft Biology Foundation to .Net Bio.

Outercurve Foundation, which is sponsored primarily by Microsoft, has added the .Net Bio project to its Research Accelerators Gallery, one of three galleries it has created to host open source software projects.

Outercurve took charge of MBF in order to make it "broadly available to the academic and research communities," Rick Benge, a community program manager in the health and wellbeing arm of Microsoft Research, told BioInform in an e-mail.

Simon Mercer, Microsoft Research's director of health and wellbeing, explained that although several groups — such as Johnson & Johnson Pharmaceutical Research and Development and cBio, a bioinformatics services provider — have contributed to the toolkit, the majority of the code comes from Microsoft's own developers, which doesn’t mesh with MBF's goal to serve as a community-developed resource.

Looking to improve MBF's participation rates, Microsoft Research "divested ourselves of ownership," Mercer told BioInform.

"We thought it would be better if the project sits in an open source foundation," he said. "It sets the right precedent in the sense that it's not a Microsoft-owned and -dominated project ... we want to be a member of that community, we don’t want to be the only player ... or have an inappropriate weight."

Furthermore, "we didn’t want to call it MBF because it's perpetuating that wrong picture of Microsoft as the only community instead of as a partner in that community," Mercer said. "That was why we changed the name to .Net Bio."

Mercer believes that these changes should help clear up misconceptions about Microsoft's role and ownership of MBF, which will hopefully encourage developers' participation.

Moving forward, Microsoft Research developers will continue to create and add code to .Net Bio, Mercer said although the group hopes to become more of a partner with other developers in the community rather than a leader.

Also, Microsoft Research employees are using .Net Bio in their own projects and plan to contribute the fruits of those efforts back to the toolkit.

For example, Microsoft researchers recently published a paper in Nature Methods describing an algorithm called factored spectrally transformed linear mixed models, or FaST-LMM, that's used for genome-wide association studies and was developed on MBF.

According to its developers, unlike current GWAS analysis algorithms, which scale exponentially, FaST-LMM scales linearly, enabling it to cope with larger datasets with improved run times and memory use.

The People Want Tools

Microsoft Research launched MBF in 2010 with the aim of familiarizing the life science community with Microsoft's data manipulation and management tools. The group encouraged developers to build plug-ins and submit their work back to the project for release in later incarnations of the toolkit (BI 7/16/2010).

MBF is a language-neutral bioinformatics toolkit for genomics research that was built as an extension to the Microsoft .NET framework. It includes a set of file parsers and file writers for common bioinformatics file formats; a set of algorithms that can manipulate DNA, RNA, and protein sequences as well as multiple sequence alignments; and a set of web service connectors to sites such as the National Center for Biotechnology Information's Blast website.

In addition to Microsoft, J&J, and cBio, contributors include researchers at Cornell University, the University of Queensland, and Illumina.

Microsoft Research released a beta version of MBF 2.0 in April. Simultaneously, the group launched a coding contest to encourage developers to develop and submit applications based on the platform (BI 4/29/2011).

These crowd-sourcing efforts as well as feedback from training courses hosted by the MBF team ultimately led to the decision to rename the platform and move it to Outercurve in addition to making some updates to its internal workings, Mercer said.

"We received a range of feedback" from the community about how .Net Bio is being used and the kinds of resources researchers require, Mercer said. This information caused the Microsoft team to rethink some of its goals for MBF and make some changes, he said.

He explained that although MBF was originally intended to "provide infrastructure so other people can write the [applications] they need" his team found that "what people want are tools" that meet their research needs and to that end Microsoft Research plans to adjust future versions of the toolkit to accommodate these requests.

For example, .Net Bio V1 includes a range of command-line utilities that expose different parts of .Net Bio's functionality — a feature that was not available in MBF, he said.

Additionally, the team is putting the finishing touches on a genome browser that will be released in a later version of the toolkit. They also rewrote some of .Net Bio's code to improve its speed and capacity.

Furthermore, the team is also "getting ideas from biologists who have no experience with .Net Bio and programming about areas which it can be expanded."

One of these is a request to provide code for microarray data analysis.

Mercer explained that while .Net Bio doesn’t offer tools for that purpose at present, his team developed a .Net application called Sho that includes a lot of statistical and mathematical functionality that should suffice.

Sho is an interactive environment for data analysis and scientific computing that lets users connect scripts in IronPython with compiled code in .NET. It also includes libraries for linear algebra and data visualization.

Mercer said that a developer could create a single application that uses.Net Bio for biological analysis and Sho for statistical analysis.

Internal Changes

Outercurve Foundation is a not-for-profit foundation that provides governance on intellectual property management and project development to help organizations develop open source software collaboratively.

Launched in 2009 with $1 million in seed funding from Microsoft, the group was initially named Codeplex Foundation but changed its name to Outercurve Foundation in 2010.

In addition to its initial investment, Microsoft inked a three-year commitment to support Outercurve's activities, officials told BioInform, adding that the firm was actively looking for additional sponsors.

The foundation's original Codeplex moniker was based on Microsoft's open source code repository, which goes by the same name. However, the ensuing confusion in the community around the mission and goals of the two resources caused the group to change its name in order to distinguish the foundation's activities from those of Microsoft.

Outercurve manages three galleries: the ASP.NET Open Source Gallery, which hosts eight projects; the Research Accelerators Gallery, which hosts four projects, including .Net Bio; and the Data, Language and System Interoperability Gallery, which hosts five projects.

As part of efforts to increase the diversity of developers contributing code and to provide "a strong foundation of services and support" for communities participating in existing and new projects, Outercurve "revamp[ed]" its leadership structure over the last year, Paula Hunter, the foundation's executive director, told BioInform this week.

She explained that the foundation made changes to its bylaws that will allow the group to hire as many as eight new members to its board of directors, adding to its current headcount of four.

Hunter said the group intends to form a technical advisory board that will include code contributors who will be eligible to run for open seats on the board after serving a year on the advisory team.

Also in the last year, Outercurve focused on building up a community of developers around its projects so that they weren’t comprised of Microsoft employees alone, Hunter said.

Currently, she said, six of the foundation's 17 projects are led by non-Microsoft staff and of the 154 developers who are contributing code to the projects, 47 percent are not employed by Microsoft.

Not only does this benefit Microsoft since its researchers are no longer doing all the coding, but it should help the foundation "reassert that we are a vendor-neutral organization and have people contributing projects to the foundation from a broad spectrum," Hunter said.

More importantly, "these projects are becoming a place where people can collaborate and ... where other developers can provide their input on features and functions and in many cases lead the projects," she said.

Outercurve believes its organizational changes have also cast its activities in a better light that could attract new sponsors, Hunter said. At present, Microsoft remains the sole sponsor of the foundation.

Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.

The Scan

Study Links Genetic Risk for ADHD With Alzheimer's Disease

A higher polygenic risk score for attention-deficit/hyperactivity disorder is also linked to cognitive decline and Alzheimer's disease, a new study in Molecular Psychiatry finds.

Study Offers Insights Into Role of Structural Variants in Cancer

A new study in Nature using cell lines shows that structural variants can enable oncogene activation.

Computer Model Uses Genetics, Health Data to Predict Mental Disorders

A new model in JAMA Psychiatry finds combining genetic and health record data can predict a mental disorder diagnosis before one is made clinically.

Study Tracks Off-Target Gene Edits Linked to Epigenetic Features

Using machine learning, researchers characterize in BMC Genomics the potential off-target effects of 19 computed or experimentally determined epigenetic features during CRISPR-Cas9 editing.