This article has been updated from a version posted May 24 to include additional comments.
Developers of the Kyoto Encyclopedia of Genes and Genomics are appealing to the life sciences community to support the widely used biological pathway database as its funding dwindles.
Minoru Kanehisa, a professor in the bioinformatics department at Kyoto University and the principal investigator of the project since it began in 1995, made the appeal in a statement published on KEGG's website last week.
In the letter, Kanehisa said that as of July 1, 2011, KEGG's FTP site for academic users will be transferred from its current home at Kyoto University to NPO Bioinformatics Japan, a not-for-profit organization that Kanehisa and some colleagues founded in order to raise funds to keep the site going.
"Contrary to popular perception, KEGG has never been a public database, as there has never been an official long-term commitment from any government agency," Kanehisa said in the letter. "Although I have managed over the years to obtain multiple and overlapping short-term research grants to support KEGG, this has become more difficult now that I am reaching the mandatory retirement age."
Under the new ownership, the FTP site will be available to paid subscribers only. Academic licenses will be jointly handled by NPO Bioinformatics Japan and Pathway Solutions, the firm that already manages commercial licenses for the resource.
Academic subscriptions will cost $2,000 for individual users and $5,000 for organizations per year.
The KEGG website will continue to be freely available, Kanehisa told BioInform.
Kanehisa said that around 3,000 users per month access the FTP data and nearly 200,000 users visit the website per month.
"If we get subscriptions from 10 percent of the current FTP users ... then we can survive and we will get stronger," he said, adding that all funds from the both the commercial and academic licenses will be reinvested to further the development of KEGG.
Currently, KEGG is maintained by 25 full-time staff and between five and ten part-time employees. Kanehisa said in his letter that the current funding for the database is not enough to support this staff, making the subscription model necessary.
In addition to the subscription model, KEGG developers are also making some changes to the website that Kanehisa said will make it more usable. For example, the KEGG Markup Language, or KGML, which was previously only available via the FTP site, is now available on the web.
KGML is an exchange format for the KEGG pathway maps that are manually drawn and updated. It enables automatic drawing of KEGG pathways and provides facilities for computational analysis and modeling of protein and chemical networks.
The KEGG application programming interface service will also continue to be freely available, Kanahisa said.
KEGG API, which consists of the SOAP/WSDL interface and the REST interface to the KEGG system, allows customization of KEGG-based analysis to, for example, search and compute biochemical pathways in cellular processes or analyze genes in completely sequenced genomes.
Kanehisa is also reaching out to international funding organizations that might be willing to provide grants to keep KEGG going.
He noted that one of the reasons NPO Bioinformatics was formed was to explore mechanisms by which commercial firms can contribute to the resource in a way that ensures that the database remains free for academic users and funding remains constant.
"If you keep asking for government funding, you will always face the possibility of funding [being] cut suddenly," he said. However, "many commercial companies buy out the databases developed by academics, then the database becomes restricted, but I don’t want that to happen ... I am trying to establish a third way of getting people involved ... It is more like a non-profit organizational type approach."
KEGG's existence over the past ten years was supported in large part by funds from the Institute for Bioinformatics Research and Development (BIRD) arm of the Japan Science and Technology Agency as well as several short-term research grants.
On April 1, BIRD was converted to the National Bioscience Database Center and refocused on supporting database integration rather than individual resources like KEGG.
Although the main database won't be funded, the NBDC has provided a three-year grant to support the integration of the KEGG MEDICUS directory with disease and drug information used in medical practices. However, "this grant is not sufficient to continue to hire my talented crew of KEGG curators and software developers," Kanehisa said in his letter.
KEGG is not the first widely used free resource to change its model after running into money troubles.
In 2009, the National Science Foundation decided not to renew funding for the Arabidopsis Information Resource, TAIR (BI 12/04/2009).
As part of its efforts to secure the funds needed to keep the database running, TAIR developers launched a corporate sponsorship program in 2010 (BI 03/19/2010) and signed on Dow AgroSciences and Syngenta Biotechnology as its first commercial benefactors later that year (BI 08/06/2010).
Monsanto, the Gregor Mendel Institute, and Grass Roots Biotechnology have since signed on as TAIR sponsors.
Kanehisa told BioInform that since he posted the call for support, members of the community have been sending messages and some have registered to learn more about the changes to KEGG.
Brad Chapman, a research associate in the department of biostatistics at Harvard, said that KEGG's licensing change “underscores the challenges” that face large infrastructure projects.
“Our current funding and recognition models favor some aspects of large projects, like development of innovative new features and integration with existing resources,” he told BioInform via e-mail, “however some other critical functions are under-recognized, like data curation or infrastructure development.”
He continued, “unfortunately you are not going to get a paper or grant for chasing down a subtle problem with a record, or for writing and documenting new interfaces to query the data; but these types of tasks make a huge difference in the usefulness and correctness of the data repositories we rely on to do good science.”
To ensure that other well used databases don’t go the way of TAIR and KEGG, Chapman believes that the community needs “to find a way to translate our use of these resources into financial rewards to the project that can be used to fund and maintain developers.”
Moving forward, “I would like to see this type of funding factored into grants so researchers would pay in the same way they pay for a microscope or new server, with funds earmarked specifically for use of infrastructure,” he said. “It'll take some work to shake out the best ways to fund these; I'd prefer models that favor open access of the data.”
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.