Early next year, 24 of Europe’s leading bioinformatics research groups in 14 countries will kick off a major collaborative project under the European Union’s Sixth Research Framework Program (FP6). The project, called Biosapiens, has secured a five-year, €12 million EU grant to create a pan-European automated genome annotation network. More importantly, according to the project’s coordinators, the funding establishes bioinformatics research as a long-overdue EU funding priority.
“Many of us in European bioinformatics hoped that the EU would engage themselves more in bioinformatics than they have done previously,” said Søren Brunak, director of the Center for Biological Sequence Analysis at the Technical University of Denmark and a co-coordinator of Biosapiens. The concern, he said, was that the EU would not recognize the value of collaborative bioinformatics research projects like Biosapiens. “We were afraid that the EU would see bioinformatics as something that is attached as a service to each single project,” he said.
Indeed, the EU has been historically stingy when it comes to large-scale bioinformatics projects — Swiss-Prot, for example, was forced to commercialize its database after it was unable to secure EU funding in 1996 — so for many in the European bioinformatics community, the new grant is a welcome pledge of support.
Twelve million euro is a large award, by EU standards, Brunak said — the total five-year budget for “life sciences, genomics, and biotechnology for health” under FP6 is €2.3 billion — but when the amount is split between 24 research groups, it doesn’t fund much cutting-edge research, he noted. Therefore, the goal of the project is to integrate existing projects to avoid duplicating genome annotation efforts across Europe. “It’s basically the same data in every country,” he said. Obviously, the US wouldn’t support multiple NCBIs, he noted, “and the EU should think in the same way within genomics and proteomics.”
Building the Network
The Biosapiens project will hold its official kickoff meeting in February. Research teams representing the UK, Denmark, Italy, Spain, Germany, Belgium, Israel, Sweden, the Netherlands, Switzerland, Poland, France, Hungary, and Finland will collaborate to create an automated genome annotation network. Each of the teams will bring a different area of expertise to the network, Brunak said, so that the project will encompass gene finding as well as promoter analysis, protein-protein interaction analysis, protein localization prediction, alternative splicing analysis, and protein fold recognition.
Janet Thornton, director of the European Bioinformatics Institute and coordinator of the Biosapiens project, said that although the EBI already has a genome annotation pipeline in place that relies on proven bioinformatics methods, the network should stimulate the development of new methods to improve the process.
“In general, the automated annotation of genomes is still quite limited,” Thornton said. By supporting the development of new technology, “We’ll be able to use the best, and most modern methods, and pick up the annotations as soon as they’re developed by the developers, rather than waiting for them to get into the EBI pipeline — we’ll be able to plug into every laboratory and pick up their best annotations.”
The backbone of the annotation network will be the distributed annotation system (DAS), the Napster-inspired technology developed by Lincoln Stein to allow researchers to share their genome annotation data. Thornton said that EBI’s Ensembl database currently uses DAS internally, so one of the first goals of Biosapiens is to extend that technology to the network participants. In addition, she said, the project participants will develop a modified version of DAS to support protein-level data. This version of the system “will allow you to annotate residue by residue, so that you can say this residue is involved in catalysis or recognition, or whatever the purpose may be,” she said.
Questions remain as to how well DAS will work for this purpose, according to Brunak, but he noted that the system is “a good compromise between simplicity and functionality.” While more complicated systems might work better in theory, he said, “I think the experience in bioinformatics has been that it is not necessarily the most complex strategy in terms of computer science that actually will fly, so we tried to be quite pragmatic in what we want to do and achieve.”
Thornton said that the project is organized along two axes in order to avoid redundancy in the annotation process: a vertical, method-oriented axis in which research teams are organized into “work groups” to focus on particular technology areas, such as gene identification, structure prediction, or alternative splicing; and a horizontal, biology-oriented axis that directs the annotation along several “thematic programs.” The first two thematic programs are the hepatitis C and the HIV viruses, Thornton said, and another is Down syndrome on chromosome 21. Researchers across the entire network will apply their tools to each of the thematic programs for around three to six months, she said.
Thornton said that the project coordinators are planning to host a dedicated resource for the annotation data, but the details of the name, URL, and related particulars have yet to be sorted out. If all goes well, she said, annotation data from the project should be online within a year.