Skip to main content
Premium Trial:

Request an Annual Quote

SARS-CoV-2 Sequence Analysis Points to Extensive Within-Host Viral Diversity

NEW YORK – Individuals with COVID-19 may carry a collection of SARS-CoV-2 viruses with a range of potentially informative genetic variants, according to a new genome sequence-based analysis by a Case Western Reserve University-led team.

"Our work brings attention to the complexity of infectious diseases that is often oversimplified when considering only the most abundant virus in an infection, and we demonstrate the importance of examining the variations that are historically considered noise," first author Ernest Chan, bioinformatics core director at Case Western Reserve School of Medicine's Cleveland Institute for Computational Biology, said in a statement.

More than 1.8 million SARS-CoV-2 genomes were housed in international sequence databases by the spring of 2021, the team noted, and ongoing research has illustrated the extent to which the increasingly diverse forms of SARS-CoV-2 that have evolved appear to impact everything from its transmissibility and ability to dodge vaccine protection to COVID-19 severity and the accuracy of diagnostic tests.

For a paper appearing in PLOS Genetics on Friday, the researchers did whole-genome sequencing on more than 250 SARS-CoV-2 isolates from patients or healthcare workers in the VA Northeast Ohio Healthcare System, analyzing the sequences alongside 110 additional SARS-CoV-2 genomes reported in international sequence repositories such as NextStrain and the "Global Initiative on Sharing All Influenza Data" (GISAID).

"Data provided through [GISAID] and NextStrain provides a continuously updating report on SARS-CoV-2 lineage classification and illustrates that an apparently low mutation rate is no guarantee that the virus will exhibit limited capacity for variation, particularly once it has become so widely dispersed through millions of infections a day," the authors explained, adding that "[a] further potential source of sequence complexity includes involvement of more than one variant in an individual infection."

By incorporating ambiguity code data for previously reported viral sequences, the researchers saw "significant within-host infection diversity," with complex SARS-CoV-2 sequence collections infecting and being passed between individuals. Their results pointed to multiple versions of SARS-CoV-2 in each of the COVID-19 cases considered.

By digging into these heterogeneous viral mixtures, the group got a glimpse at SARS-CoV-2 evolution within the context of COVID-19 dynamics and viral transmission events.

From these and other results, the researchers argued that consensus genome sequence reporting may inadvertently miss the full viral diversity involved in individual COVID-19 infections, while potentially underestimating the complexity behind the transmission of SARS-CoV-2 strains and variants.

"Our work brings attention to the complexity of infectious diseases that is often over-simplified when considering only the most abundant virus in an infection, and we demonstrate the importance of examining the variations that are historically considered noise," Chan said, noting that "genetic variants observed in low frequency in SARS-CoV-2 infections can be early indicators of new strains responsible for later transmission surges."

More generally, the authors explained, "[a]dequately addressing the under-reporting of infection complexity represented in data repositories has significant potential to alter vitally important characteristics of infectious diseases, including assessment of drug treatment efficacy and resistance and vaccine escape."

Even so, they cautioned that the SARS-CoV-2 conclusions in the current study are based on a small set of COVID-19 cases, which did not allow for more detailed analyses on heterogeneous viral mixtures in specific patient subgroups, such as individuals with distinct genetic susceptibilities, treatment histories, or varying infection lengths and experiences.