NEW YORK – A team from Bangladesh has characterized variants in global SARS-CoV-2 genomes from strains circulating as of the end of March — part of that group's genetic search for viral sequence shifts that may eventually be used to understand COVID-19 epidemiology as well as the trajectory of the ongoing pandemic.
"[T]he present method of genome annotation employed at this early pandemic stage could be a promising tool for monitoring and tracking the continuously evolving pandemic situation, the associated genetic variants, and their implications for the development of effective control and prophylaxis strategies," senior author Anwar Hossain and his colleagues wrote in a paper published in Scientific Reports on Wednesday.
Hossain was a microbiology researcher at the University of Dhaka when the study was performed and is currently based at the Jashore University of Science and Technology, also in Bangladesh.
The researchers relied on a multiple sequence alignment method called MEGA to search for sequence differences in 2,492 SARS-CoV-2 complete or nearly complete genome sequences from GISAID, after tossing out low-quality and ambiguous sequences in the dataset. The collection includes sequences submitted to the GISAID database by the end of March, they explained, and represented COVID-19 infections in individuals from 58 countries on half a dozen continents.
Among other genetic features, the team tracked down 1,516 single nucleotide variants in the SARS-CoV-2 genome, along with deletions falling in protein or non-coding portions of the viral genome, some of them limited to subsets of strains from certain parts of the world. By digging into such deletions, the group attempted to pick up alterations that may coincide with documented clinical features in those regions.
The analyses also delved into mutation patterns occurring at the gene level in the collection of sequences from the novel coronavirus, while providing a look at more than 700 amino acid substitutions that accumulated in the first few months of its spread.
While amino acid residues in the SARS-CoV-2 spike protein's host-interacting receptor-binding domain were largely conserved, for example, the researchers saw signs that the alterations affecting amino acid sequences more broadly may have been cropping up with slightly different frequencies depending on the continental cluster.
They also took a crack at searching for potential ties between geographical locales, climate, viral genetic features, case fatality rates, and more, though they warned that the study is limited by the data available at the time of the analyses.
Along with other limitations, for example, the authors noted that "many countries have not sequenced enough virus samples … and some countries uploaded sequences collected from samples of single-source or zone of infection."
On the clinical correlation side, they noted that "reported disease severity ([which] may not represent the actual severity) might be affected by several other factors, for example, healthcare facilities, average age group, genetic context of the population, and control strategies adopted by the countries."
The authors called for additional research on the genetic variation that exists in SARS-CoV-2, suggesting that "investigations should focus on structural validations and subsequent phenotypic consequences of the deletions and/or mismatches in transmission dynamics of the current epidemics and the immediate implications of these genomic markers to develop potential prophylaxis and mitigation for tackling the crisis of pandemic COVID-19."
Ongoing research from other teams, including members of the Nextstrain team, have been using large-scale genome sequence analyses to track SARS-CoV-2 transmission patterns, its potential origins, and subtle genetic changes that have arisen in strains stretching back to the initial documented outbreak in China.
In a paper published in The Lancet earlier this week, researchers from Singapore presented data for almost 300 individuals with confirmed SARS-CoV-2 infections from the PROTECT study, where they uncovered proposed ties between milder COVID-19 cases and versions of the SARS-CoV-2 virus with the same open reading frame 8 deletions.