Submit a preprint

247

Re-annotation of SARS-CoV-2 proteins using an HHpred-based approach opens new opportunities for a better understanding of this virususe asterix (*) to get italics
Pierre BrézellecPlease use the format "First name initials family name" as in "Marie S. Curie, Niels H. D. Bohr, Albert Einstein, John R. R. Tolkien, Donna T. Strickland"
2024
<p>Since the publication of the genome of SARS-CoV-2 – the causative agent of COVID-19 – in January 2020, many bioinformatic tools have been applied to annotate its proteins. Although efficient methods have been used, such as the identification of protein domains stored in Pfam, most of the proteins of this virus have no detectable homologous protein domains outside the viral taxa. As it is now well established that some viral proteins share similarities with proteins of their hosts, we decided to explore the hypothesis that this lack of homologies could be, at least in part, the result of the documented loss of sensitivity of Pfam Hidden Markov Models (HMMs) when searching for domains in "divergent organisms". To improve the annotation of SARS-CoV-2 proteins, we used here the HHpred protein annotation tool and an available custom HH-suite database of HMMs specific to Homo sapiens proteins. To avoid "false positive predictions" as much as possible, we designed a robustness procedure to evaluate the HHpred results. In total, 6 robust similarities involving 6 distinct SARS-CoV-2 proteins were detected. Of these 6 similarities, 3 are already known and well documented, and one is in agreement with recent crystallographic results. We then examined carefully the two similarities that have not yet been reported in the literature. We first show that the C-terminal part of Spike S (the protein that binds the virion to the cell membrane by interacting with the host receptor, triggering infection) has similarities with the human prominin-1/CD133; after reviewing what is known about prominin-1/CD133, we suggest that the C-terminal part of Spike S could both improve the docking of Spike S to ACE2 (the main cell entry receptor for SARS-CoV-2) and be involved in the delivery of virions to regions where ACE2 is located in cells. Secondly, we show that the SARS-CoV-2 ORF3a protein shares similarities with human G protein-coupled receptors (GPCRs) belonging mainly to the "Rhodopsin family". We conclude that the approach described here (or similar approaches) opens up new avenues of research to better understand SARS-CoV-2 and could be used to complement virus annotations, particularly for less-studied viruses.</p>
https://www.uniprot.org/proteomes/UP000464024You should fill this box only if you chose 'All or part of the results presented in this preprint are based on data'. URL must start with http:// or https://
You should fill this box only if you chose 'Scripts were used to obtain or analyze the results'. URL must start with http:// or https://
You should fill this box only if you chose 'Codes have been used in this study'. URL must start with http:// or https://
Pfam Domains, HHpred, Hidden Markov Models (HMMs), Bioinformatics, Protein annotation, SARS-CoV-2.
NonePlease indicate the methods that may require specialised expertise during the peer review process (use a comma to separate various required expertises).
Bioinformatics, Evolutionary genomics, Viruses and transposable elements
e.g. John Doe john@doe.com
No need for them to be recommenders of PCI Genomics. Please do not suggest reviewers for whom there might be a conflict of interest. Reviewers are not allowed to review preprints written by close colleagues (with whom they have published in the last four years, with whom they have received joint funding in the last four years, or with whom they are currently writing a manuscript, or submitting a grant proposal), or by family members, friends, or anyone for whom bias might affect the nature of the review - see the code of conduct
e.g. John Doe john@doe.com
2023-06-08 10:17:04
Jitendra Narayan