Skip to main content
Premium Trial:

Request an Annual Quote

New Specification Simplifies Communication of NGS Data Analysis, Results for FDA Regulatory Review


NEW YORK (GenomeWeb) – A consortium of biomedical stakeholders has come up with a systematic way to describe high-throughput sequencing computational workflows and analysis that they hope will help stakeholders shorten the turnaround time for regulatory review and simultaneously alleviate the perennial problem of research reproducibility.

The consortium, spearheaded by George Washington University and the US Food and Drug Administration, recently published the first iteration of the so-called BioCompute Object specification document. It describes a framework for presenting details about bioinformatics workflows and procedures that satisfies regulatory review requirements and ensures the ­­appropriate use of genomic analysis pipelines once they are created, according to several individuals interviewed for this story.

Recent technological advances in the healthcare and biomedical space have made it possible to generate large, heterogeneous datasets for disease prognostics and diagnostics and other uses, but how those data were used is often a black box, Raja Mazumder, an associate professor of biochemistry and molecular medicine at the GW School of Medicine and Health Sciences and one of the BCO's developers, said in an interview.

For example, it is insufficient to simply identify the software used to find variations in a tumor genome by name alone. "If you go and start using that exact software with different parameters or a different version number … your results will be different. There might not even be a single match between what I did and what you did," he explained.

That has resulted in a "catastrophic" lack of reproducibility that pervades both the research and regulatory domains, said Vahan Simonyan, HIVE team principal investigator in the FDA's Center for Biologics Evaluation and Research (CBER), Office of Biostatistics and Epidemiology, and one of the developers of the BCO. And it's problematic for regulatory agencies like the FDA that depend on detailed research data to make determinations about the validity and usability of healthcare products.

At the FDA, "our mission is to evaluate the safety and efficacy of medical products" and increasingly "we make these decisions based on complex data and sophisticated computational pipelines," Simonyan said. "When you [use] heavy computational pipelines [to demonstrate] the quality of a medical product and [generate] data which is used as evidence for the medical product, we need to have a good, consistent and most importantly, an interpretable recipe."

Right now, details of these computational processes submitted to the FDA are often incomplete resulting in costly "regulatory loops" that drag on and on and those costs are often passed on the consumers when products go to market, according to Simonyan. "Every time that we receive information that is incomplete [and] we [ask] for more information, this communication goes through multiple layers of our organization and that takes a significant amount of time."

The hope is that the specification will simplify those communications because pharma companies will know upfront what elements the FDA expects to see in submissions as well as what computational formats to use if they plan to submit their products for regulatory approval. They can account for those details when they map out their projects. It should also cut down on the time that FDA regulatory scientists spend trying to figure out what researchers did, according to consortium members.

The BioCompute Consortium grew out of a Broad Agency Announcement proposal that invited open participation from academia, pharmaceutical companies, and research and regulatory organizations.

For consortium member Seven Bridges Genomics, "our motivation is really rooted in our experience working with our customers," said Dennis Dean, a research and development scientist at Seven Bridges and one of the major contributors to the BCO. "We work with a good number of big pharma and biotech and national consortia so we are in this unique position to see what the challenges are in creating workflows."

Furthermore, "we've been involved in the development of technologies to make it easier to build compute workflows and communicate them," he added. Working with members of the BioCompute consortium "gave us an opportunity to think about how we accelerate the adoption and development of drugs."

Mazumdar believes that the field is now ready to have these conversations in earnest. Until 2014, when the first meeting of the consortium was held, no one was really talking about how NGS analysis results should be communicated to regulatory agencies like the FDA. "For something like this to happen, you need to have a lot of other things happen," he said. For example, there had to be clear standards used for capturing, describing, and storing information — for example fastq and vcf files are generally used to record and share next-generation sequence and mutation data — as well as established ontologies for describing genomic data such as the Gene Ontology.

There are also newer standards that have emerged in recent years like the Common Workflow Language (CWL), which provide specifications for describing analysis tools and workflows in a way that allows third parties to use them irrespective of the platforms and systems on which they were initially created. There's also the genomics component of the Fast Healthcare Interoperability Resources (FHIR) standard, which provides resources for integrating clinical genomic information to provide better patient care via point-of-care apps and patient-facing apps, as well as for clinical research and analysis.

The consortium sought to use as many of these and other informatics standards as possible. "There was a concerted effort right from the beginning to not invent anything new but to focus on the parts that are really needed for regulatory review," Dean explained. "The thought was that the BioCompute specification would be an umbrella standard for incorporating already existing standards." That's important because "we don't have to reinvent things that are already there and we can facilitate adoption because people have already adopted the standards that we believe will likely be part of the umbrella."

The specification allows users to encapsulate software execution protocols in the CWL scripts and standards, said developer Mark Walderhaug, an associate office director in the Office of Biostatistics and Epidemiology in FDA's Center for Biologics Evaluation and Research. They can also describe the conditions in which those protocols were executed as well as the inputs and outputs that produce the observed results.

Lastly, users can define the usability and parametric domains of their protocols — what the pipeline can be used to analyze and what the analysis parameters are, respectively — as well as report any other data that is needed to execute the workflow. "There are plenty of other standards that have been involved. The [specification] serves as the connecting tissue between these different aspects of biocomputational protocol," he said.

Beyond regulatory review, there's room to use BCOs in other contexts. For example, it could provide a consistent means of communicating computational protocols between pharmaceutical companies who choose to farm out their analysis needs to bioinformatics service providers making it easier for them to track the versions of software used and possibly reproduce the analysis internally, Mazumdar noted. Other use cases include clinical trials or in academic research settings, for example, where research projects are sometimes discontinued because the lead researcher left the lab.

The consortium members interviewed for this story hope to see the specification broadly adopted and they believe that one of the key selling points will be that it offers a consistent format for submitting computational procedures and protocols to the FDA. "Right now, the options will be you write things down in Microsoft word … and say this is what I did," Mazumdar said. "It could be one small paragraph of what you did with not all the pertinent information there … or you can send hundreds of pages of what you did, either way it is very different from … submission to submission."

However, they caution that the language is still evolving. At this time, CBER is not requiring that the community use BioCompute objects in their communications with its office in part because the specification is still so new and such a move would be premature without proper testing, Simonyan said.

Mazumder expressed similar sentiments in his comments. "We struggled with the first iteration of the document [in terms of] how detailed we should be because every field has its own nuances," he said. "I think it will evolve over time frankly. [We're] hoping to get feedback from folks outside of FDA like pharma companies or bioinformatics platform companies."

To that end, the developers have published several sample BCOs as examples. These come from researchers at Harvard University and Seven Bridges Genomics, among others. "The more use cases we have, the better we are in terms of understanding these nuances," Mazumdar said. The consortium has also prepared a brochure to help researchers who want to use the specification to discuss the details and benefits with their companies.

Simonyan also hopes that the potential cost and time savings will further incentivize stakeholders to use the specification. CBER has begun recommending the specification to industry stakeholders in its meetings for use in their submissions with an eye towards more rigorous testing of the specification. The office is also reaching out to other arms of the FDA hoping for more parties to put the specification through its paces.

Eventually, CBER could publish guidance regarding the use of the BCOs and may even formally embed them in its regulatory review processes but that is still a long way away, Simonyan noted. For now, the focus is simply on testing.

Gil Alterovitz, director of Harvard Medical School's Biomedical Cybernetics Laboratory and one the major contributors to the BCO, said that his team has already added mechanisms for linking BCOs directly to the FHIR Genomics specification. "That means it's part of an [American Medical Centers]-accredited standard and that's very important," he said. "Because other agencies whether they be government or industry sometimes require or like to see that a product was developed in collaboration or somehow linked to a [Standards Development Organization]."

Once the BCO framework is established, it could be used for other types of FDA submissions such as large clinical trials, according to its developers. Furthermore, although version one focuses solely on NGS data analysis, the consortium is considering how the specification could be used in descriptions of other kinds of analytics such as those that use natural language processing or to describe the results of simulations, Valderhaug said. "We've communicated with our stakeholders and we've developed this open consortium so that we can get back as much feedback as possible," he said. "It's not just an FDA standard, it’s a community standard."

Dean said that his firm is has already seen interest in the specification from big pharma companies. Although he could not go into specifics due to non-disclosure agreements, he did say that there have been requests for information about what they need to know to incorporate the BCOs in their internal processes.

Meanwhile, Seven Bridges is ensuring that its internal workflows are consistent with the BCOs and will also spread the word about the specification in talks and presentations moving forward, Dean said. The company is also putting together best practices for including BioCompute tags within its existing projects as well as exploring technology improvements and advances that will make it easier for its pharma clients to share their computational protocols with the FDA.

File Attachments