NEW YORK – Researchers at the Chinese Center for Disease Control and Prevention and their collaborators have conducted an in-depth annotation of the newly discovered coronavirus (2019-nCoV) genome, finding several differences between 2019-nCoV and severe acute respiratory syndrome (SARS) or SARS-like coronaviruses.
In a study published recently in Cell Host & Microbe, the researchers described their systematic comparison of 2019-nCoV and several other SARS and SARS-like viruses, identifying 380 amino acid substitutions between these coronaviruses that may have caused a functional and pathogenic divergence in 2019-nCoV.
Coronaviruses are genetically classified into four major genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus. Six kinds of human coronaviruses have been previously identified, including HCoV-NL63 and HCoV-229E, which belong to the Alphacoronavirus genus; and HCoV-OC43, HCoV-HKU1, severe acute respiratory syndrome coronavirus (SARS-CoV), and Middle East respiratory syndrome coronavirus (MERS-CoV), which all belong to the Betacoronavirus genus.
The coronavirus genomes range from approximately 26,000 to 32,000 bases, and include six to 11 open reading frames (ORFs), the researchers noted. The first ORF represents approximately 67 percent of the entire genome and encodes 16 non-structural proteins, while the remaining ORFs encode accessory proteins and structural proteins. The four major structural proteins are the spike surface glycoprotein (S), small envelope protein (E), matrix protein (M), and nucleocapsid protein (N). The spike surface glycoprotein plays an essential role in binding to receptors on the host cell and determines host tropism. The spike proteins of SARS-CoV and MERS-CoV bind to different host receptors via different receptor-binding domains.
For this study, the researchers performed in-depth genome annotations on the first three determined genomes of 2019-nCoV — HB01, HB04, and HB05 — and compared them to related coronaviruses, including 1,008 human SARS-CoV, 338 bat SARS-like CoV, and 3,131 human MERS-CoV.
At the amino acid level, the researchers found that the 2019-nCoV was quite similar to SARS-CoV, but also saw some notable differences. For example, the 8a protein was present in SARS-CoV but absent in 2019-nCoV. The 8b protein was 84 amino acids long in SARS-CoV, but 121 amino acids long in 2019-nCoV. In contrast, the 3b protein was 154 amino acids long in SARS-CoV, but only 22 amino acids long in 2019-nCoV.
"Further studies are needed to characterize how these differences affect the functionality and pathogenesis of 2019-nCoV," the authors wrote.
Based on a phylogenetic analysis on the whole genomes of the various viruses, the researchers found that the 2019-nCoV was in the same Betacoronavirus clade as MERS-CoV, SARS-like bat CoV, and SARS-CoV, but that 2019-nCoV had the highest similarity with a SARS-like bat CoV, and was less related to the MERS-CoVs.
These findings are similar to those of another group of Chinese researchers who published a study in The Lancet at the end of January, noting that 2019-nCoV was genetically distinct from the SARS virus that caused the epidemic in 2002 and 2003, as well as from the MERS virus that was detected in 2012.
However, although phylogenetic analyses for the whole genome and individual genes clearly showed that the 2019-nCoV was most closely related to SARS-like bat viruses, the researchers for this study said they did not find a single strain of a SARS-like bat virus that harbored all proteins with the most similarity to counterparts of the 2019-nCoV.
"Given the close relationship between 2019-nCoV and SARS-CoVs or SARS-like bat CoVs, an examination of the amino acid substitutions in different proteins could shed light into how 2019-nCoV differs structurally and functionally from SARS-CoVs," the authors concluded. "In total, there were 380 amino acid substitutions between the amino acid sequences of 2019-nCoV (HB01) and the corresponding consensus sequences of SARS and SARS-like viruses."
Relatedly, the University of California, Santa Cruz Genomics Institute said late last week that it has posted the complete biomolecular code of 2019-nCoV for researchers to use. UCSC Genome Browser engineers are drawing data deposited into the National Institutes of Health's National Center for Bioinformatics (NCBI) and processing it into a visual display of the virus.
"When we display coronavirus data in the UCSC Genome Browser, it lets researchers look at the virus' structure and more importantly work with it so they can research how they want to attack it," UCSC Genome Browser Engineer Hiram Clawson said in a statement.
The Browser also allows for annotation, so researchers can collaborate and share experimental information.