Skip to main content
Premium Trial:

Request an Annual Quote

As COVID-19 Emergency Ends, Bioinformaticians Look Back to Inform Future Pandemic Preparedness


NEW YORK – With the US public health emergency officially ending Thursday and the World Health Organization making a similar declaration last week, the COVID-19 pandemic is effectively over, at least in the minds of key policymakers, even as SARS-CoV-2 continues to circulate and mutate.

The initial response to the discovery of the then-novel coronavirus at the end of 2019 and into 2020 was unprecedented in speed and scope, including for the genomics and bioinformatics communities. Many of those same researchers are hoping that the work they have done in the last three years can inform the response to the next widespread outbreak of an infectious disease, be it a pandemic or merely a regionalized epidemic.

With the expiration of the emergency, the Centers for Disease Control and Prevention (CDC) will stop tracking and reporting community levels of COVID-19 and test positivity rates, but tracking of COVID-19 hospitalizations and ICU admissions will continue through April 30, 2024. The CDC will use data from the National Vital Statistics System to track mortality and will monitor SARS-CoV-2 variants with wastewater testing, but reporting on variants will change from weekly to biweekly.

The federal government will no longer cover the costs of vaccines, tests, and treatments. Requirements will loosen for private health insurers as well, though vaccines generally will be free to insured individuals because they count as preventive care.

Overseas, the European COVID-19 Data Portal is remaining open and is still accepting data submissions. The same goes for the UK-based Wellcome Sanger Institute-led COVID-19 Cell Atlas.

ELIXIR, the European life sciences infrastructure for biological information, has funding through October 2024 for a separate project called BeYond COVID that aims to support genomic surveillance of infectious diseases as well as connect genomics data with public health and real-world healthcare data.

The US has no formal equivalent of BeYond COVID, but American bioinformaticians and virologists are indeed looking beyond COVID.

Data decommissioning

John O'Horo, an infectious disease specialist and clinical IT researcher at Mayo Clinic, is concerned that that much of the bioinformatics infrastructure built at the beginning of the pandemic was done so hastily in an ad hoc manner. "A lot of it's being decommissioned" as funding runs out, O'Horo said.

He expressed particular concern with the end of state and national reporting requirements. However, testing had largely shifted from laboratory-based PCR assays to home rapid antigen kits about a year into the pandemic, so accurate case data became scarce a long time ago.

"Just because of that … there isn't a data cliff that we're going to be falling off at this point because that data has become progressively less valuable," O'Horo said. However, nothing has replaced the CDC, Johns Hopkins, and the University of Washington's Institute for Health Metrics and Evaluation's (IHME) COVID-19 data reporting and forecasting reports that were so popular two years ago.

O'Horo said that such interpretive reports were invaluable to him as a clinician to understand the prevalence of the outbreak in local communities.

"The pandemic going away [and] the data quality going down because we have home testing … are all very good things," O'Horo said. But, he added, the earlier efforts highlighted the importance of having data-sharing agreements and consortia.

When the Omicron variant first appeared in late 2021, Tulio de Oliveira from the University of KwaZulu-Natal in South Africa actually disseminated the first viral sequence in a tweet.

"At that point in COVID, Twitter was a very effective, measured way to shout your findings from the rooftops," O'Horo said. Other scientists did quickly upload the sequence into more formal research databases.

"There were tools in place to share some of the more detailed information, and that's what I'm really hoping that we can have for the next pandemic, hopefully a better tool than Twitter for this kind of reporting," O'Horo said. "Continued ability to data-share is going to be critically important for figuring out what is the next pandemic or getting a head start on things like sequencing new subvariants if it's COVID or the next strains of flu."

From niche to necessary

Serghei Mangul, assistant professor of clinical pharmacy and quantitative and computational biology at the University of Southern California, said that bioinformatics for viral sequencing was a "niche thing" before COVID-19. He suggested that it was difficult to publish about such tools in general medical journals, not just those focused on viruses.

But tools did exist.

Before the pandemic in 2019, the Chan Zuckerberg Initiative developed a free, cloud-based data-analysis platform to identify pathogens and other microbes from sequencing data. The technology, dubbed Chan Zuckerberg ID (CZ ID) has been making inroads with infectious disease researchers around the world, in part due to its capability to process long-read data from Oxford Nanopore Technologies, which offers a series of small, portable sequencers suitable for deployment in low-resource environments.

In 2020, bioinformaticians at the Center for Genomic Regulation (CRG) in Barcelona, Spain, quickly built a platform that allows scientists worldwide to analyze raw and consensus COVID-19 sequencing data to compare genomic, proteomic, structural, and motif variability of SARS-CoV-2.

Called the COVID-19 Viral Beacon, the resource draws, in part, on the preexisting Global Initiative on Sharing All Influenza Data (GISAID). It also leans on sources including the European Nucleotide Archive (ENA) and the US National Center for Biotechnology Information's Sequence Read Archive (NCBI/SRA), and now contains genetic variants and associated metadata in a collection of more than 100,000 viral sequences.

Nextstrain, an open-source pathogen genome project created by an international coalition, was also quickly repurposed for SARS-CoV-2.

However, GISAID is not great for storing metadata such as when and how a sample was collected and the severity of a patient's infection, according to Mangul, who was corresponding author of a 2022 Nature Methods paper about bioinformatics for COVID-19 and future pandemics.

Mangul said that it would be helpful to have that kind of information for faster identification of the lethality of viral strains, but he called epidemiological metadata the "Wild West" because there are not really any relevant interoperability standards.

"People might not appreciate that they need to do it. They're just not aware of the tremendous value of the metadata that they have," he said.

"It would be ideal if all the metadata would be available and would go back to GISAID," Mangul said. "There's so many interesting fundamental biological questions you can ask about, [like] the evolution of the virus [and] how it's transmitted."

GISAID currently only stores annotated sequences, not raw sequencing data. While some researchers have shared raw data in the Sequencing Read Archive, Mangul noted that it is unwieldy to switch back and forth between repositories that are not linked together.

"I would like to have raw data because I all would like to double-check that something was done in the best way," he said. The full sequences also allow for reanalyzing data with new bioinformatics tools, particularly since so much viral genome assembly was done in haste at the beginning of the pandemic.

"Retrospectively, we can do a better job, and I think we should," Mangul said. But databases were not designed to encourage sharing.

The Nature Methods paper found that about 80 percent of genomic data in GISAID came off Illumina sequencers. Mangul said that was just "a matter of choice" as well as what was available at the time, but that the database does support long-read sequences.

He noted that the error rate of Pacific Biosciences long-read sequencers has come down and is now comparable with short-read Illumina sequences. "Especially for viruses, both PacBio and [Oxford] Nanopore are very promising because they cover longer stretches of viral genomes," Mangul said.

Mangul said that as COVID-19 transitions from pandemic to endemic, it would be helpful to see more benchmarking studies looking at accuracy between sequencing manufacturers as well as the efficacy of various bioinformatics tools.

"Does it make results better if you run long-read technology? If so, it would be nice to have evidence," he said.

Benchmarking studies by definition cannot be prospective, but Mangul is optimistic that benchmarking can be run in parallel with biosurveillance so epidemiologists and bioinformaticians can adjust on the fly.

The next pandemic

O'Horo said at last month's Healthcare Information and Management Systems Society (HIMSS) conference that the pathogen that will cause the next pandemic or major epidemic probably already exists, so it would behoove the bioinformatics and epidemiological communities to get ready now and not forget the lessons of COVID-19. 

The Mayo physician said that data-sharing agreements set up since early 2020 will help restart shuttered efforts in the future, but the emergency declarations at the national level enabled much of the data flow. He also said that the US data network may not have ever hit the "sweet spot" of facilitating viral data exchange without adding heavy reporting and administrative burdens.

O'Horo said that influenza — an endemic virus that is perennially mutating and has seasonal prevalence — could be the template for research into new COVID-19 therapeutic and vaccination development. 

"I think that what the COVID pandemic taught us is that we have excellent biotechnology resources for response," O'Horo said. "Look at how fast it went for us to sequence [the virus], to have our first vaccine candidates, to have our first treatment candidates that panned out."

While this pandemic produced the first widely distributed mRNA vaccines, much of the research was rooted in the 2003-04 SARS-CoV-1 and 2012 MERS coronavirus outbreaks.

"That's all very important groundwork that was done, and we certainly got a lot more groundwork done for the next epidemic," O'Horo said.

However, there still is a lot to learn about translating bioinformatics research into public awareness of infectious diseases, he said. After all, the COVID-19 response and the 2021 vaccine rollout became politicized and polarizing, often drowning out the science. 

"[Politicization] is also a risk, but without good data, we can't make good public health recommendations or individual health recommendations," O'Horo said, and the earlier the framework is in place at the start of an epidemic, the better. "That's a critical period for public health interventions and education because you have people's attention," he noted.

O'Horo said that there is no good early-warning network for genomic biosurveillance. He noted that "traditional" epidemiological methods identified the monkeypox outbreak last year, but it was more reactive than proactive. It's useful to know these methods work, he noted, "but we're not taking advantage of some of the technology from the epidemiologic informatics standpoint until well after this has occurred."

A small but well-connected startup software company wants to play a major role in building such a biosurveillance network.

This month, New York-based Biotia — cofounded by Weill Cornell Medicine geneticist and computational biologist Christopher Mason — formally launched a program called GeoSeeq Watchtower, offering support to groups around the world to profile emerging risks for outbreaks by monitoring infectious disease "hotspots" with the help of genomics. The program, partially backed by the Rockefeller Foundation, is a component of a planned early-warning technology platform called GeoSeeq.

CEO and Cofounder Niamh O'Hara, alongside Biotia partners from Brazil, Germany, South Korea, and the US, presented GeoSeeq Watchtower to the United Nations General Assembly in February.

"We need a global early-warning system," O'Hara told GenomeWeb. She called current surveillance systems "disparate," "inequitable," and "siloed," unfit for pathogens that cross borders. "This needs to be an international effort."

Biotia, a graduate of the Mayo Clinic's Platform Accelerate program, has built a next-generation sequencing-based microbial surveillance service.

O'Hara, who has a Ph.D. in evolutionary biology, called Watchtower a "local solution" for biosurveillance of pathogens that uses the Biotia software. "We are looking at diagnostic deserts and areas that are underserved or could be hotspots for emerging pathogens," she said.

The 12 organizational participants in Watchtower cover eight countries on four continents and are all members of the Metagenomics and Metadesign of Subways and Urban Biomes, or MetaSUB, consortium, of which Mason serves as medical director.

Biotia Chief Technology Officer David Danko, who led bioinformatics for MetaSUB while he was a graduate student, said there is only an informal connection between Biotia and MetaSUB, though both are focused on metagenomics.

Danko, who has a background in environmental metagenomics, urban biosurveillance, and artificial intelligence for bioinformatics, said that the company and the Watchtower affiliates are interested in looking for signs of novel pathogens. Biotia is particularly focused in metatranscriptomic data for early detection of pathogens, mostly because viruses that infect humans tend to be RNA-based.

"The metatranscriptomic part is so that we have as broad a base as possible," Danko said. "PCR tests and panel tests, they're great if you already know what pathogen or pathogens might be a risk, but for novel detection, you really need an unbiased approach."

The informatics behind Watchtower are relatively straightforward. Biotia takes a FASTQ file and runs it against a set of reference genomes, then annotates it with a list of known microbes that are present.

However, reference genomes do not exist for novel pathogens, but Danko said that unknown sequences make the work "exceptionally interesting." Clues may reside in gene edits and genome fragments.

"The main challenge with surveilling for pathogens like that is that there's very little opportunity to validate your techniques," he noted. "You don't know [if] your technique doesn't work until it's too late."

Plus, borders are irrelevant to pathogens, and international data sharing is difficult because some organizations and national health ministries want to retain ownership of their data for both cultural and legal reasons, O'Hara noted. Federated informatics architecture and clear data-sharing agreements overcome some of these hurdles.

Early detection

Proactive biosurveillance is the goal, but it is not yet feasible.

"I'm sure science is going to get to the point where we can do some crazy computational modeling and know exactly what's going to be a pathogen before it infects anyone," Danko said. "[But] what we want to do right now is to be able to identify what's a high-risk pathogen event very early on and then help local communities be able to react to that and prevent further spread."

As a bioinformatics researcher in virology, Mangul and colleagues at USC are now looking to develop informatics for "passive" surveillance of wastewater to compensate for the fact that there is far less COVID-19 sequencing data being generated. "Wastewater can be noninteractive surveillance," he said.

"You can actually tell what strains are there so you can estimate the changes of the prevalence of the strains in the population," Mangul added. "But perhaps you can also assemble novel strains or lineages" from sequencing of wastewater samples.

With proper benchmarking data, researchers could see if a sequence includes a novel viral strain previously missing from the database. 

"Moving forward, from a bioinformatics perspective, we can make sure databases are better and we can make sure tools are better," Mangul said.

"Some choices that we made during the pandemic were not informative and not data-driven because we didn't have the data or we were in a rush to implement things," Mangul continued. "There is so much to learn from that and improve our bioinformatics, improve sequencing, and improve experimental protocols so that we are ready" for the next public health emergency.