The Pistoia Alliance, a public/private consortium looking to improve pharmaceutical R&D informatics, recently unveiled several proof-of-concept systems intended to help life science research organizations search genomic repositories effectively.
At a meeting of the consortium held in Boston last week, six vendors unveiled four cloud-based platforms designed to allow pharmaceutical companies to securely access genomic information stored in public databases, as well as upload and analyze their proprietary data.
The vendors were tasked by the alliance's sequence services working group to develop a secure infrastructure that could deliver good performance, scalability, and availability. The proof-of-concept systems were also required to perform Blast queries on datasets and to host secure installations of Ensembl and PlasMappper, an application that generates and annotates plasmid maps using plasmid DNA sequence as input.
Pistoia, which kicked off in 2007, launched the sequence services project with the goal of providing pharma companies with access to precompetitive databases and software using third-party resources. The guiding principle is that pharma can benefit from informatics vendors who are able to provide services to multiple users who share in the cost of maintaining the service — an improvement over the current model in which individual drug companies maintain current versions of all software and databases behind their firewalls.
The proof-of-concept systems were developed as part of a first round of efforts to develop infrastructure that supports the alliance's vision of collaboration. The alliance is planning additional development phases with an immediate focus on addressing the challenges of next-generation sequencing data.
Because data security is paramount, prior to the meeting, all the cloud-based platforms were subjected to hacking by AT&T. Simon Thornber, a business consultant for GlaxoSmithKline and project lead for the sequence services team, said during his presentation at the meeting that some platforms performed well while others fell short, but developers were quick to correct the issues identified by the hackers.
Participants in the proof-of-concept exercise included UK-based bioinformatics firm Eagle Genomics, who partnered with Cognizant, a technology and business-process outsourcing firm; Thomson Reuters; Constellation Technologies, a cloud computing firm who partnered with Microsoft; and Infosys, which provides business consulting, IT, and outsourcing services .
Rob Gill, Constellation's chief technology officer, told BioInform that the combined offering from his firm and Microsoft provides "a real cloud solution" rather than a "repackaged hosting solution," which is how he characterized some of the other systems. The service lets users access applications and tools on either Windows or Linux platforms, he said.
The team built a Microsoft's Azure-based front end for the portal where the sequence data is held and managed. In addition to being simple to use, Gill said companies would also have the ability to move Azure inside their firewalls. The back end uses a Linux server to provide sufficient compute power to run Blast and data analysis workflows and host Ensembl. Furthermore, overlaid on the portal is another Microsoft tool called powerpivot, which lets users produce graphical representations of their data as well as divvy up the data into different populations, for example.
Constellation develops cloud computing tools based on open-source software. To date its science focus has been in the area of particle physics, which generates petabyte-sized datasets, but Gill said that the company counts the genomics space as one of its target markets.
"The actual analytical problems and the complexity of the analysis itself [in physics and bioinformatics] are obviously very different, but the effect of being able to do [bioinformatics analysis] on [particle physiscs] hardware using the skills that [physicists] have around large data analysis make a huge amount of sense," Gill said. "I think there is a lot of value to be had there."
Infosys, meantime, provided an infrastructure based on Amazon's cloud, and said it was able to speed up Blast query job performance by a factor of four through massive data parallelization.
Anirban Ghosh, who works with Infosys' research informatics solutions team, told BioInform that the system has a framework that maps public identifiers with private information in a controlled fashion. It also provides users with administrative access to upload files, map inter-operable terms and identifiers, and moderate who has access to the data.
Pistoia's sequence services working group is currently planning a second phase of the project that focuses on next generation sequence data, which is considered to be the next major challenge for pharmaceutical informatics. Participating vendors are already making plans to tailor their systems to handle NGS datasets.
Subhro Mallik, associate vice president for life sciences at Infosys, told BioInform that while conference attendees responded positively to its offering, some potential clients felt they could derive more value from the platform when it encompasses more services around NGS. Infosys expects to have its current platform ready for the market in a few months.
Peter Sheppard, Cognizant's head of life sciences, told BioInform that his company partnered with Eagle because the bioinformatics firm offers the right combination of "experience in delivering these kinds of solutions," as well as a secure platform from which these services could be delivered.
Like Mallik, Sheppard said that while the Cognizant/Eagle platform is ready to go into production, the partners expect that pharma's interest in the platform will increase once capabilities for NGS data analysis and management have been incorporated.
He pointed out that the proofs of concept were intended to show the feasibility of external vendors implementing secure solutions, but most pharma companies already have similar systems operating internally. These groups, he said, are interested in external platforms that can manage NGS content rather than investing in developing their own in-house infrastructure.
While the exact details of phase II are still being ironed out, Thornber said the group is looking at issues around data loading and management as well as developing simple analysis pipelines for RNA-seq and ChIP-seq, for example.
He also said that, depending on the success of the second phase, in later phases, the group will look into incorporating additional commercial tools and algorithms as well as increasing the "level of sophistication of the services."
A New Business Model
Nearly 100 attendees from pharma and biotech companies, instrumentation and software vendors, publishers, and academic institutions attended the last week's meeting, which was held at Thomson Reuters' Boston campus.
The Pistoia Alliance is one of several initiatives launched in recent years to encourage pharma scientists to work together on non-proprietary aspects of their research. In fact computational biologists from several major companies in the space published a paper in 2009 calling for more openness, collaboration on pre-competitive bioinformatics projects, and enhanced data sharing (BI 09/04/2011).
The push for revising the standard pharmaceutical industry model is influenced by several factors, including patent cliffs and global economic woes, which are driving pharmas to cut back on costs.
During his presentation, Thornber noted that billions of dollars' worth of drugs will be going off patent over the next few years and the industry isn't getting enough new compounds approved to make up for these losses. Adding fuel to the fire, government spending on pharmaceutical research has decreased since the global economic collapse and the legal fees from several class action suits have cost companies dearly.
The industry has tried to adapt with strategies such as jettisoning non-core services, mergers and acquisitions, as well as marketing new non-drug products. But moving forward, it is likely that pharma companies will "externalize" non-core and pre-competitive services, Thornber said, particularly if there are good resources for doing so and if it makes sense financially.
Applying this model to informatics should help pharma grapple with the rise of genomic data, since many companies have invested in sequencing platforms, analysis tools, and commercial data sources over the last several years.
Thornber noted that the industry's current informatics model is costly to maintain, encourages duplication of efforts, and makes collaboration among researchers impossible because of security constraints. Furthermore, companies miss out on the wealth of information that’s available in the open source community.
Moving pre-competitive informatics resources to a cloud-based model should help pharma cut costs while expanding access to new research capabilities, he said.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.