NEW YORK (GenomeWeb) – European Commission plans to create a new cloud-based infrastructure for data storage and sharing across Europe are being welcomed by the continent's genomics community, who view it as "mandatory" for the success of their projects.
At the same time, questions remain about how the €6.7 billion ($7.6 billion) that has been pledged for the implementation of the European Open Science Cloud over the next four years will be spent, and how existing cloud initiatives, such as ELIXIR, will interact with the proposed infrastructure.
Because of this, many experts interviewed by GenomeWeb since the EC announced its European Cloud Initiative last week said they are keen to play a role in the development of the EOSC so that it best addresses the needs of the European genomics community.
"Cloud approaches in the life sciences are clearly very relevant to basic and translational science, and have the potential to tremendously benefit society," said Jan Korbel, a principal investigator in the genome biology unit at the European Molecular Biology Laboratory in Heidelberg, Germany.
According to Korbel, the success of the European Cloud Initiative will depend on how funds will be used, with attention paid toward crafting specialized interfaces for genomics data analyses as well as improving scalability to enable user access to large repositories when necessary.
"Merely repackaging money already spent on IT infrastructure may not lead to significant breakthroughs," said Korbel. "Obtaining more [central processing unit] power such as buying exascale machines ... will not address the needs of the European genomics community."
Korbel currently co-directs the data-sharing efforts of the International Cancer Genome Consortium's Pan-Cancer Analysis of Whole Genomes project, which he believes could inform the creation of the EOSC. He noted that the European genomics community currently lacks easily and remotely accessible clouds, and argued that important reference datasets in the life sciences, such as European cancer genome datasets, should be always available via the EOSC to facilitate research, and, ideally, stored close to usage sites.
"If we do not act this way, we risk falling behind academic cloud initiatives in biology and health that are pursued outside of Europe," Korbel said.
"It is essential that past lessons are learned and that this next phase of investment is driven by users to provide excellent science, not to meet the needs of service providers."
It is a perspective shared by a number of informatics insiders in Europe's most elite institutes and companies who wish to see the "single digital market" envisioned by the EC's initiative created according to the needs of users.
In its announcement, the commission said the EOSC should provide a "trusted environment" for data sharing across technologies, disciplines, and borders." The EC also pledged "world-class supercomputing capability, high-speed connectivity, and leading-edge data and software services."
By next year, the EC aims to make available via the cloud all data generated by projects funded through its Horizon 2020 research and innovation program. By 2020, it aims to have developed and deployed a large-scale European high-performance computing, data storage, and network infrastructure.
Representatives of the EC involved with the coordination of the initiative could not be reached for further comment.
Roland Eils, head of theoretical bioinformatics at the German Cancer Research Institute (DKFZ) in Heidelberg, portrayed the initiative as urgently needed because European genomics data is frequently being moved outside of Europe, often to services hosted by US providers such as Amazon and Google, leaving them subordinate to the decisions of their hosts.
"In effect, it means an aggregation of data and knowledge on platforms outside of Europe, which will lead in the end to a strong dependence on the respective infrastructure providers," Eils said, a trend that, in his words, jeopardizes the future success of the European genomics community. For that reason, he and his colleagues at DKFZ see the EOSC as a welcome "counteractive measure" to this trend.
"There are American companies hosting clouds in Europe," Eils noted, "but no single European cloud infrastructure is available to perform genomic analysis with large datasets, or enable the secure transfer, storage, share and access of these data."
DKFZ over the weekend hosted a workshop devoted to cloud solutions for the life sciences at its Heidelberg headquarters. EMBL's Korbel also helped to organize the workshop in Heidelberg.
Eils said that the EOSC was discussed at the workshop and "highly accepted" by attendees, as it should encourage data-driven collaboration. "In genomics, it will democratize the IT landscape, allowing efficient processing even for those researchers not having high-performance infrastructure at their disposal."
Eils added that such a resource is "mandatory for the sustainable success of the scientific community" in Europe, though he echoed Korbel's concerns. "The question is which part of the proposed funding amount will be dedicated to those needed solutions with a clear benefit for the major part of the scientific community," Eils said.
"It seems to be foreseeable that, in terms of the EOSC, the planned supercomputing and quantum technologies will require an enormous investment;" said Eils. "This will be essential for the flagship projects and for the development of new technologies, but won't help the scientist in their daily work to process and share data."
Steven Newhouse, head of technical services at the European Molecular Biology Laboratory's European Bioinformatics Institute (EBI) in Hinxton, UK, similarly stressed the need for user input in the creation of the EOSC.
"The EU has been investing in European e-infrastructures for several decades," said Newhouse, who also attended the Heidelberg workshop. "This has produced some successes but has frequently not been driven by users, but by service providers," Newhouse said. "It is essential that past lessons are learned and that this next phase of investment is driven by users to provide excellent science, not to meet the needs of service providers," he said.
Ewan Birney, director of EBI, said that there are historical reasons for why the European genomics community must engage the EC to inform the creation of a genomics-friendly cloud infrastructure. According to Birney, the EC has in the past invested heavily on the computational side — by investing in CPU power, for instance — because it was looking to serve the needs of physicists or meteorologists, who required investments in data storage. With genomics now on its agenda, the EC must take a different approach, dealing with communities that require more assistance in sharing and analyzing data than storing it.
"We are hoping that this will set standard methods for cloud approaches, including commercial clouds and any available academic clouds," said Birney. He said that existing clouds are diverse in terms of internal scaffolding and their abilities to employ scale. By streamlining cloud environments across the continent via the EOSC, academic researchers should be able to benefit and overcome some of the data-sharing hurdles that have so far limited collaboration.
Birney added that, from his perspective, the EOSC should complement the European Life-science Infrastructure for Biological Information (ELIXIR), a public initiative for sharing molecular data among European research groups. Following a 2013 kickoff, the EC pledged €19 million toward ELIXIR last year via the Horizon 2020 program. EBI is currently playing a coordinating role within ELIXIR, along with nodes in more than a dozen other countries.Birney noted that Newhouse has been "heavily engaged" in EOSC discussions, and hopes that EBI will be able to play a role as the cloud is formed. "We still feel it is concentrated a little too much following traditional computing intensive sciences, but we are happy now that [the genomics community] is at the table, which is great," Birney said.
Newhouse agreed that the EOSC's emphasis on infrastructure should support activities such as ELIXIR, along with other projects, such as Helix Nebula, a private-public data-sharing partnership in which EMBL is playing a coordinating role. The partnership recently received €5.3 million in Horizon 2020 funding to develop HNSciCloud, a "competitive marketplace of innovative cloud services serving the European Research Area," over the next two years.
"The proposed investment from the EC into the EOSC will help drive the development and integration of these and other European initiatives to hopefully provide a set of services that will be directly usable by the life-science community," said Newhouse.
DKFZ's Eils also noted that it will take years to build up ELIXIR, and that the process could be "accelerated significantly" by the EOSC. "The same is true for the German Network for Bioinformatics Infrastructure (de.NBI), where a membership in ELIXIR is envisaged," he added.
Costas Bekas, manager of the foundations of cognitive computing group at IBM Research in Zurich, Switzerland, said that all of the initiatives underway should ultimately foster the creation of cloud environments that are of use to the European genomics community.
"Suddenly, when you have a large-scale resource like this, you also have competition, which means the best solutions will be identified by the user," said Bekas, adding that "scientific evolution will find the best way forward."
Bekas said that while IBM, which employs 380,000 worldwide, has its own internal cloud infrastructure, it sees the EOSC as "extremely valuable" to its ongoing research efforts.
"As a large-scale enterprise, we need to have collaboration at a global scale, and we are in need of this kind of tool set," said Bekas. "Right now in Zurich we are collaborating with 80 to 100 other institutions and companies, and they need to have a way for collaborating at the scientific level," he said.
Bekas stressed that he is "completely confident" that the EC will consult with the European genomics community as the EOSC is implemented, noting that any issues raised by the community will be addressed, as the EC cannot afford to not address them.
"The importance of this cloud is so big that all of the stakeholders around will make sure the most important aspects will be addressed," Bekas said.