CHICAGO – Less than two months after going live with a system for allocating and managing bioinformatics resources among its various research units, Pfizer already is moving beyond its initial use case of single-cell RNA sequencing data.
Enoch Huang, vice president of integrative biology and medicinal sciences for Pfizer's worldwide R&D operations, said that the pharmaceutical heavyweight is working with vendor Seven Bridges to retrofit bulk RNA-seq processes onto the technology platform.
Pfizer and Boston-based Seven Bridges announced their collaboration in February and the software company delivered a prototype "project gallery" in early summer. The effort officially launched Sept. 1, according to Huang.
The gallery is a centralized dashboard for tracking Pfizer research projects, including the project name, key people, metadata tags, creation date, the server location, and requests for data access.
At the time the deal was announced, the firms said that they would collaborate on management of single-cell RNA sequencing data, though Huang said that scRNA-seq is just a starting point.
"We [initially] wanted to find a use case with Seven Bridges that met a clear and present need, and that was single cell," Huang explained. "We wanted to pick a data type that did not have too much legacy associated with it, just because it's easier to deal with [when there are not] a lot of embedded, entrenched processes."
In other words, the test case would be about avoiding silos in the first place rather than breaking down existing data silos.
Huang said that scRNA-seq was kind of a middle ground for this effort in between the more mature bulk RNA-seq and the nascent area of spatial transcriptomics.
"Let's start with single cell RNA-seq," he said. "If we get that right, we can probably go back and address some of the other unfinished business with bulk RNA-seq using single-cell RNA-seq as a base … and that also sets us up well in the future for spatial transcriptomics once the actual experimental plans are clear there."
Huang discussed the Seven Bridges relationship last month in a presentation to the Bio-IT World conference in Boston, and discussed the integrative biology program in more detail in an interview this week.
Huang said the Seven Bridges partnership is an effort to both centralize and decentralize computational biology and bioinformatics. While that may seem contradictory, it makes sense in an environment like Pfizer's where there are multiple research units, each with its own IT framework.
The pharma company has seven embedded bioinformatics groups, grouped by function like therapeutic areas as well as functions such as early clinical and toxicology. "These are the embedded computational practitioners that primarily focus on downstream applications of process data," Huang explained.
Each discipline has what the firm calls partner-line or platform-line functions, and Pfizer's individual research units were unclear about who was responsible for those functions, according to Huang.
"Sometimes [research teams] would address questions on a one-off basis, but it wasn't clear whether that was their mandate or not," Huang explained. "They were just looking for capable computational biologists, so they would ask around until there was someone who was willing to take the request."
This created overlap, which Huang called "confusing" to the research units, some of which had more resources than others. Pfizer decided that the best approach to this problem was to start with a blank slate and design an organization to allocate bioinformatics skills to research units based on their actual needs.
"The real question was, do we go with a fully centralized model or a fully decentralized model or a hybrid model?" Huang recalled. The company chose a hybrid model, where downstream work that was specific to each disease area would be embedded into research units, while upstream data management that was common to all units would be centralized.
Previously, units would "roll their own" by creating local systems for upstream platform support, Huang said. "That's not desirable because you can duplicate pipeline development, you can duplicate data management systems," and data ended up in silos.
Raw and processed data alike was decentralized and difficult to access and share. "We were all developing pipelines in isolation," Huang said.
Pfizer founded its integrative biology program in late 2018 to address needs and pain points that Huang's team identified in an internal report. "Not surprisingly, many of these departments were starting to grapple and struggle with the advent of single-cell RNA-seq technology in different ways," Huang said.
The integrative biology team is responsible for basic infrastructure and common needs across Pfizer R&D. Medicinal sciences is considered a partner line, but not a research unit within Pfizer, so Huang and his team could function as a neutral party.
"I wanted a solution that was sustainable for the long term, flexible, customizable for our specific use case around single-cell RNA-seq data type, but also the way we generate the data within Pfizer, which is seven different ways, in theory," Huang said. He also said it was important to follow the FAIR principles of data being findable, accessible, interoperable, and reusable.
The pharma firm's integrative biology scRNA-seq technical group found needs in the computational environment, data visualization, workflows, and data storage. The group decided to defer the visualization question to a later time and focus on the other three areas in the near term. That led to the decision to bring in Seven Bridges.
Huang said that RNA-seq data management was a "problem that had been solved before," and thus better left up to a vendor than to Pfizer IT staff. The other issues were more specific to the drugmaker.
He said that Pfizer has not decided whether to allow outside researchers to access the platform, though the Seven Bridges system allows the drug company to control permissions for third-party access in the future. As implemented for Pfizer, Seven Bridges is compatible with both the Google Cloud Platform and Amazon Web Services.
"It does open a door for a more efficient way to do similar collaborations or participate in consortia because we could be on the same cloud-based system," Huang said. "Clearly, that was on the mind when we designed the collaboration and platform."