NEW YORK – Using a new modeling method to dig into whole-genome sequences for millions of SARS-CoV-2 isolates, a team led by researchers from the Broad Institute, Massachusetts General Hospital, and Harvard University has unearthed sequence substitutions that seem to boost the fitness of coronaviruses behind the ongoing COVID-19 pandemic.
As they reported in Science on Tuesday, the researchers relied on a hierarchical Bayesian multinomial logistic regression modeling method known as PyR0 to assess viral lineage distributions, geographic prevalence, and estimated fitness in relation to viral mutations based on publicly available genome sequences for close to 6.4 million SARS-CoV-2 isolates collected globally.
"[P]yR0 provides a genome-wide, automated approach for detecting viral lineages with increased fitness," Jacob Lemieux and Pardis Sabeti, the study's co-senior and co-corresponding authors, and their colleagues wrote. "By combining a model-based assessment of lineage fitness with absolute case counts, our model provides a global picture of the events of the first two years of the pandemic."
Lemieux is affiliated with the Broad Institute and Massachusetts General Hospital, while Sabeti is affiliated with the Broad, Harvard, the Massachusetts Consortium on Pathogen Readiness, and the Howard Hughes Medical Institute.
Together, these data uncovered fitness-enhancing substitution mutations in spike protein coding sequences and beyond that appear to have bumped up SARS-CoV-2 fitness over time. Outside of the spike protein, for example, the team flagged suspicious substitutions in sequences coding for the nucleocapsid protein or for non-structural proteins.
"Applied to the full set of publicly available SARS-CoV-2 genomes, [PyR0] provides a genomic view of the mutations driving increased fitness of the virus, identifying experimentally established driver mutations” in the spike-coding sequence and highlighting the role of non-spike mutations, the authors wrote.
They further noted the importance of identifying mutations in the nucleocapsid-coding sequence, ORF1b, and ORF1a, which they said have received relatively less research attention.
The modeling approach also made it possible to predict the relative fitness and potential growth patterns for newly detected SARS-CoV-2 lineages based on genome sequence profiles, the researchers explained.
"PyR0 forecasts growth of new lineages from their mutational profile, ranks the fitness of lineages as new sequences become available, and prioritizes mutations of biological and public health concern for functional characterization," they wrote.
More broadly, the investigators pointed out that their new model can provide a peek at the molecular processes behind enhanced viral fitness, while revealing wider viral evolution patterns over the course of the pandemic. Based on data for lineages characterized so far, they concluded that major lineages tend to split into comparably fit sub-lineages that are then replaced by new lineages or variants with still further fitness increases.
"Some lineages increased in fitness more than others over the course of the pandemic," the authors reported, noting that sub-lineage fitness differences hint that "the propensity to acquire new spike mutations depends on the constellation of mutations that comprise a lineage, consistent with epistasis."