Skip to main content
Premium Trial:

Request an Annual Quote

TAIR Eyes Corporate Sponsorship Program as Other Model-Organism Databases Focus on Sustainability


By Vivien Marx

This article has been updated to remove comments that were provided for a previous article and erroneously included here.

The Arabidopsis Information Resource is about to launch a corporate sponsorship program in order to remain afloat after the National Science Foundation decided to phase out its funding.

TAIR principal investigator Eva Huala told BioInform that she and her team have already received "some early indications of interest" from companies that are registered TAIR users.

"We don't expect this will bring in a large amount, but every little bit helps," she said, adding that the hope is to obtain "somewhere around 10 to 20 percent" of the resource's operating budget in this fashion.

The National Science Foundation decided last year to renew TAIR funding only until this August and then to phase out the resource's funding over the next few years. NSF awarded TAIR a total of $13.7 million between 1999 and 2009. Last year, it awarded the resource a $1.6 million grant that expires in August 2010, and plans to award it $1.2 million in 2011, $800,000 in 2012, and $400,000 in 2013 [BioInform Dec. 4, 2009].

Huala said that she has had to lay off two people, while "another person has just departed voluntarily due to the uncertainty of our situation, and we're likely to lose more people in the near future if our fundraising and grant-writing efforts don't begin to pay off soon."

TAIR's situation underscores the perennial tenuous nature of publicly funded bioinformatics databases and other resources.

"As is happening with TAIR, uncertainty of funding destabilizes projects like this," Harvard University's William Gelbart, co-developer and principal investigator of the fruit fly resource FlyBase, told BioInform. "If you lose key people, it is sometimes tough to recover from that."

Given the dropping costs of sequencing, model-organism database resources stand to take on greater significance as more and more sequence data comes online. But long-term funding for those resources is not guaranteed.

TAIR is not the only model organism database facing funding challenges. For example, Compositdb, the database for Compositae species such as lettuce and sunflower, has posted a note on its website that it is "currently without support." It had been funded by the US Department of Agriculture's Agricultural Research Service.

Due to "funding constraints and the uncertainty of long-term funding, we are not attempting a comprehensive database" but are trying to archive "as much information as possible that is not readily available elsewhere and provide links to access information that is obtainable from other sources," the Compositdb scientists wrote.

The scientists said they are interested in expanding the resource for "any Compositae species for which there is sufficient data and interest."

Considering Commercial Support

Huala said that she and the TAIR team are waiting to hear back about a few "smaller" NSF grants and continue to work on new funding proposals. "TAIR is the only major model organism database not funded by [the National Institutes of Health] despite many biomedical advances springing from Arabidopsis research over the years," she said.

NSF has "strongly encouraged" TAIR to seek funding from other sources, including subscription fees for companies or academic labs and funding from other US agencies or other countries, Huala wrote last year in a open letter to TAIR users.

A corporate sponsorship model would be "much better than going to a system where we require non-academic users to subscribe because it will allow us to keep the TAIR site freely accessible without log-ins and we will also avoid the time-sinks of enforcement, marketing, and managing subscription accounts," Huala said.

Solid Footing

Model organism databases that rely on NIH funding are not as hard-hit as TAIR and CompositDB, but that doesn't mean that sustainability is not a concern.

The Saccharomyces Genome Database, for example, will be submitting a renewal proposal for the resource this fall, Mike Cherry, principal investigator of SGD, told BioInform.

SGD received approximately $3.8 million in funding from the National Human Genome Research Institute in fiscal year 2009, including a $977,000 grant under the American Recovery and Reinvestment Act that funds the development of new user interfaces and analysis tools.

Cherry noted that the ARRA grant is welcome because SGD's funding under its primary grant "has effectively decreased over the past few years." While the stimulus funding is short-term, it "will allow a wealth of enhancements for our integration and analysis of these large datasets," Cherry said.

He explained that the SGD team works on database, hardware, and software environment updates "quite slowly." For example, the team has worked for the last two years to design, implement, and test a new schema. Overall, "maintenance of the site is the focus of our staff this does not allow much new tool development," Cherry said.

Harvard's Gelbart said that FlyBase has been "fortunate" that NHGRI takes bioinformatics "seriously." NHGRI has awarded FlyBase $4 million for fiscal year 2010, and awarded it $400,000 in stimulus funding in fiscal year 2009.

Gelbart noted that resources like FlyBase and other resources develop a number of tools that ultimately benefit the broader research community. For example, FlyBase developers created the Chado ontology-based database system, which has "been attractive to a number of communities," as well as the genome annotation curation tool Apollo and the genome viewer GBrowse.

Chado and Apollo were developed at at FlyBase and GBrowse was produced by Lincoln Stein's group, then at Cold Spring Harbor Laboratory.

With reusable tools like these, "anyone can pick [them] up to start their effort and hopefully save" the need for full-time employees, Gelbart said. At the same time, each new genome presents new informatics issues. "Nothing's for free," he said.

These challenges are "only going to be exacerbated by next-gen sequencing and assembly," he said. "The floodgates are opening."

FlyBase currently hosts genomic information for 12 Drosophila species and is working out how to incorporate RNA-seq data and protein-protein interaction data. "We're about two releases away from making that public, as well as the full set of RNA-seq data," he said.

Gelbart noted that databases are a crucial component of the genomics research ecosystem. "It doesn’t matter how cool the [sequencing] technologies are, [without databases,] the data will never get shared. That's the bottom line."

At the same time, he noted that funding commitments for major databases are "marathons, not sprints."

Sustainability of bioinformatics resources has been an ongoing challenge for the field. A report published last year by the National Academies titled, "Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age," pointed out that high-throughput technology and exploding datasets highlight the increased need for computational methods of data management, but concluded that maintaining those resources poses a challenge for funding agencies.

"The questions of who pays, how much, and for how long are at the heart of the problem of how to ensure long-term stewardship of research data," the report noted.

Research data represent "a sizable investment of human and financial resources, and preserving those data typically costs less than generating them in the first place," according to the report. "Nevertheless, maintaining high-quality and reliable databases can have significant costs."

Furthermore, the report said, future uses of data are "difficult to predict," and the return on those costs is often not clear." In many fields, there still is no consensus as to who should maintain large databases or who should bear the costs."

Gelbart chairs the NHGRI panel on prioritization of new genome sequencing projects and said that NHGRI recognizes that the model organism databases and portals are "community resources."

He said he is hopeful that NIH will continue to recognize the importance of sustainability for bioinformatics resources — not least because former NHGRI director Francis Collins is now director of NIH. Gelbart said he expects that "the vision and foresight he had in his former position will continue to help extend that vision to all of NIH."