Unbiased cleavage site prediction uncovers viral antagonism of host innate immunity by SARS-CoV-2 3C-like protease

Research ArticleCOVID-19Virology Open Access | 10.1172/jci.insight.185739

Nora Yucel,1 Silvia Marchiano,2,3,4 Evan Tchelepi,5 Germana Paterlini,6 Ivan A. Kuznetsov,1 Kristina Li,1 Quentin McAfee,1 Nehaar Nimmagadda,1 Andy Ren,1 Sam Shi,1 Alyssa Grogan,7 Aikaterini Kontrogianni-Konstantopoulos,7 Charles Murry,2,3,4,8,9,10 and Zoltan Arany1

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Yucel, N. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Marchiano, S. in: PubMed | Google Scholar |

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Tchelepi, E. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Paterlini, G. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Kuznetsov, I. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Li, K. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by McAfee, Q. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Nimmagadda, N. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Ren, A. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Shi, S. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Grogan, A. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Kontrogianni-Konstantopoulos, A. in: PubMed | Google Scholar |

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Murry, C. in: PubMed | Google Scholar

1Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Institute for Stem Cell and Regenerative Medicine,

3Center for Cardiovascular Biology, and

4Department of Laboratory Medicine & Pathology, University of Washington, Seattle, Washington, USA.

5NetQuest Corporation, Mt. Laurel Township, New Jersey, USA.

6Certusoft, Bloomington, Minnesota, USA.

7Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

8Division of Cardiology, Department of Medicine, University of Washington, Seattle, Washington, USA.

9Sana Biotechnology, Seattle, Washington, USA.

10Department of Bioengineering, University of Washington, Seattle, Washington, USA.

Address correspondence to: Zoltan Arany, Professor in Medicine, Perelman School of Medicine, University of Pennsylvania, Smilow Center for Translational Research, 11th floor, 3400 Civic Blvd., Philadelphia, Pennsylvania 19104, USA. Email: zarany@pennmedicine.upenn.edu.

Find articles by Arany, Z. in: PubMed | Google Scholar |

Published February 23, 2026 - More info

Published in Volume 11, Issue 4 on February 23, 2026
JCI Insight. 2026;11(4):e185739. https://doi.org/10.1172/jci.insight.185739.
© 2026 Yucel et al. This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. View PDF Abstract

How SARS-CoV-2 causes a wide range of clinical manifestations and disease severity remains poorly understood. SARS-CoV-2 encodes 2 proteases (3CLPro and PLPro), vital for viral production, but also promiscuous with respect to host protein targets. Pharmacological inhibition of 3CLPro markedly reduced hospitalization and death in Phase 2/3 clinical studies. Here, we develop a bioinformatic algorithm, leveraging experimental data from SARS-CoV, to predict host cleavage targets of 3CLPro. We capture targets of 3CLPro described previously for SARS-CoV-2, as well as thousands of putative targets. We validate numerous targets cleaved during infection, including the giant sarcomeric protein obscurin and the innate immune protein OAS1. A long form of OAS1, p46, has been associated in numerous GWAS studies with lesser COVID disease severity. We show that 3CLPro cleaves p46 OAS1 immediately upstream of a known prenylation domain, relocalizing OAS1 from subcellular membranes to the cytosol, rendering it akin to the nonprotective, cytosolic p42 isoform. Similar OAS1 relocalization occurs upon infection by SARS-CoV-2. Our data provide a high-throughput resource to identify putative host cleavage targets of 3CLPro and reveal a mechanism by which SARS-CoV-2 antagonizes host innate immunity in individuals with the protective p46 isoform of OAS1.

Introduction

COVID-19 was a leading cause of death and morbidity across the world from the initial outbreak in Wuhan, China, in December 2019 until 2023 (1). How SARS-CoV-2, the causative agent of COVID-19, leads to the wide range of disease manifestations remains incompletely understood. In addition to lung damage, SARS-CoV-2 infection can also cause kidney damage, clotting disorders, loss of taste and smell, cognitive dysfunction, muscle atrophy, and cardiac dysfunction (25). Furthermore, long-lasting COVID-19 symptoms have been reported in patients now up to 5 years after initial illness, including fatigue, shortness of breath, brain fog, and elevated heart rate (68). Half a decade later, how SARS-CoV-2 affects differently the multiple organs and cell types involved, and what host characteristics determine who will develop severe or long COVID-19, remains poorly understood. A deeper mechanistic understanding of virus-host interactions is thus needed.

Various studies have identified interaction of host proteins with a number of SARS-CoV-2 components, including the spike, envelope, nucleocapsid, and membrane proteins (911).These interactions have various consequences, including suppression of innate immune response, apoptosis evasion, and reprogramming of host transcription and translation. In addition to protein-protein interactions, important virus-host interactions can be caused by enzymatic cleavage of host proteins by viral proteases. For example, myocardial dysfunction following infection by coxsackie CVB3 virus can in part be ascribed to cleavage of dystrophin protein by the viral protease 2A (12, 13); enteroviral 3C proteases can cleave host NLRP1 to trigger inflammasome activation (14); HIV-1 protease mediates apoptosis by cleaving host procaspase 9 and Bcl2 (15, 16); and the Zika virus nsP2 cysteine protease can cleave host proteins SFRP1, NT5M, and FOXG1 (17). Viral proteases thus often modulate pathogenic responses within the host, beyond their direct role in viral replication.

SARS-CoV-2 encodes for 2 proteases, a papain-like protease (PLPro) and the 3C-like protease (3CLPro, also knowns as main Protease, MPro, or NSP5). These proteases are highly conserved across coronavirus species and are required for viral replication (1822). They are thus attractive targets for antiviral therapies. An early Phase 2/3 clinical trial of Paxlovid, a 3CLPro inhibitor PF-07323332 (nirmatrelvir) administered in combination with ritonavir, revealed 89% reduction in hospitalization with COVID-19 (23, 24). Paxlovid was approved for Emergency Use Authorization by the Food and Drug Administration in December 2021 for patients at high risk for progression to serious disease. The success of Paxlovid demonstrates the crucial role of 3CLPro in COVID-19 pathology.

Both PLPro and 3CLPro are generated via autocatalytic cleavage from the overlapping ORF1a and ORF1ab polyproteins, the first translation products following SARS-CoV-2 infection. The ORF1a/ab polyproteins encode 16 nonstructural proteins (NSP1–16) that build the viral replication machinery (Figure 1A). The nonstructural proteins are liberated from the ORF1a/ab polyprotein through cleavage by PLPro and 3CLPro, encoded by NSP3 and NSP5, respectively. Following this processing, PLPro remains bound to the endoplasmic reticulum membrane, while 3CLPro cleaves itself from the membrane into the cytosol. The effect of these proteases on the host proteome, and in particular by 3CLPro, has yet to be fully understood. Targeted screening of 300 IFN-stimulated proteins in cell lines overexpressing SARS-CoV-2 3CLPro identified the ubiquitin protein ligase BRE1A, encoded by the RNF20 gene as a target of 3CLPro (25). In a different screening of 71 immune pathway-related proteins, the IFN regulatory transcription factor IRF3 was identified as a target of PLpro, and NLRP12 and TAB1 as targets of 3CLPro, suggesting the role of those proteases in the innate immune response to the virus (26).

Bioinformatic prediction of SARS-COV2 3CL human protein targets.Figure 1

Bioinformatic prediction of SARS-COV2 3CL human protein targets. (A) Diagram of SARS-CoV-2 3C-like protease (3CLPro) function. 3CLPro cleaves at 11 sites within polypeptides pp1a and pp1ab, generated from overlapping reading frames ORF1a and ORF1ab. Cleavage by 3CLPro, and papain-like protease (PLPro) liberates nonstructural proteins (NSPs) required for viral function. (B) Identification and scoring of 3CLPro cleavage sites within the human proteome. Scores for each position along the cleavage site (P5-P3′) obtained from published SARS-CoV 2 3CLPro data. In total, 195,684 scored cleavage sites (>0) were detected across the human proteome for P1 position of M, H or Q. (C) Distribution of all scores (Log10Score). Published cleavage sites shown in red. (D) Correlation of predicted score with published Kcat/Km values for SARS-COV 3CLPro. Scores generated in this study compared with NetCorona1.0 and 3CLP algorithms. For R2 calculations, the cleavage site between NSP9 and NSP10 was excluded. (E) Receiver operator curve (ROC) analysis to assess predictive power of SARSPORT versus 3CLP based on scores of published cleavage sites. Cumulative percentage of scores plotted versus score rank (highest score = 1, lowest score = 100), and area under the curve (AUC) captured. A 95% CI was determined by Wilson/Brown method. (F) Secondary structure of high scoring sites (>0.1) with P1 = Q. Q100aa window centered around P1 was analyzed by JPRED4 to predict secondary structure (“-” = unstructured, “H” = α-helix, “E” = β-sheet). Highlighted is a predicted site in Cadherin-6 (CADH6) shown in AlphaFold. (G) Fraction of each P1 (Q) within each secondary structure type (unstructured, α-helix, β-sheet). Comparisons shown for predicted cleavage sites with score > 0.1 versus published cleavage sites versus published secondary structure distribution of all glutamines. Statistical analysis calculated using χ2 goodness of fit. (H) Score distribution of published cleavage sites (P1=Q) by secondary structure. Statistics calculated by 1-way ANOVA with Holm-Sidak multiple-comparison test.

Only limited efforts have thus far been taken to identify systematically, and in an unbiased fashion, host cleavage targets of 3CLPro from SARS-CoV-2. Given the high conservation of 3CLPro, such analysis would also extend across other coronavirus species. One approach taken used N-terminomics to identify neo-N-termini generated by the viral proteases; 14 new cellular (27) and > 100 substrates (28) were identified in 2 different global studies in either SARS-CoV-2–infected cell lines, or 3CLPro-treated cell lysates. This approach is limited, however, by the need for sufficient protein abundance and appropriate fragment size to be detected by mass spectrometry (29, 30). Comparing these published targets, only 3 protein targets have been identified by more than 1 study (TAB1, ATAD2, and NUP107). The lack of overlap across different studies reflects the lack of saturation of these experimental approaches. Moreover, cleaved proteins that are subsequently degraded (a process accelerated by infection) (31, 32) also escape detection by N-terminomic approaches. In silico approaches provide the opportunity to overcome these limitations and to avoid laborious experimental screens. An initial such approach relied on similarity between cleavage sites in the viral polypeptide across divergent human and nonhuman coronavirus species (NetCorona1.0) (33) to predict SARS-CoV 3CLPro targets. However, this method often does not match experimental cleavage studies on SARS-CoV, likely because of the divergent coronavirus species used for identifying targets and generating scores. For example, NetCorona1.0 predicts that a sequence containing a proline at the P2′ position can be cleaved, but this substitution has been shown to block cleavage in cleavage assays (34). In addition, NetCorona1.0 does not consider cleavage site accessibility conferred by secondary structure, the relative efficiency of cleavage at different sites, or the possibility that there may be host target sites of higher affinity than viral sites. Nevertheless, these in silico methods and screens have uncovered several validated host targets of 3CLPro, from proteins involved in transcription and translational machinery (28, 3537) to antiviral and immune signaling pathways (26, 3842). The wide scope of validated targets underscores the critical role of 3CLPro in mediating disease pathology and highlights the need for improved methods for prediction of host protein cleavage by 3CLPro.

Here we combine published cleavage efficiency data on the SARS-CoV 3CLPro, which is 96% similar to SARS-CoV-2 3CLPro (43), with genome-wide secondary structure analyses, to identify and score 99,000+ predicted SARS-CoV/SARS-CoV-2 3CLPro cleavage sites across the human proteome. Through score filtration and secondary structure analysis, we identify over 1,000 high-likelihood sites. We rediscover nearly all prior SARS-CoV-2 3CLPro experimentally identified sites, and we experimentally validate newly identified prominent targets with purified reagents and in cell culture. We tested cleavage targets in infected cardiomyocytes (CM) as proof-of-concept validation of targets that are large structural proteins, whose size would make them more likely to possess cleavage sites. We show that 3CLPro expression leads to cleavage and degradation of the giant sarcomeric protein obscurin in human induced pluripotent cell–derived CMs (hiPSC-CM) and recapitulates the sarcomeric disorganization observed with SARS-CoV-2 infection in hiPSC-CMs (38, 44, 45). In addition, we demonstrate degradation of obscurin in SARS-CoV-2–infected hPSC-CMs. We further use comparative bioinformatic analyses with identified loci in genome-wide association studies (GWAS) to identify the innate immune defense protein OAS1 as a predicted 3CLPro target, and we show that 3CLPro cleaves OAS1 directly, leading to its release from intracellular membranes, its primary site of action. Our study provides a comprehensive atlas for identifying the degradome of 3CL proteases, applicable to SARS-CoV-2 and, in light of the structural conservation of the 3CL protease, across coronavirus species (46) in possible future coronavirus outbreaks. Our study also reveals a mechanism by which SARS-CoV-2 antagonizes host innate immunity.

Results

Bioinformatic prediction of SARS-CoV-2 3CLPro targets using experimental data from SARS-CoV 3CLPro. We first sought to identify and score potential cleavage targets of the 3CLPro encoded by SARS-CoV2. Given the 96% sequence similarity between 3CLPro from SARS-CoV-2 and SARS-CoV as well as the homology in the viral genome cut sites (18, 25, 46), we developed an algorithm based on experimental data generated previously from SARS-CoV (2003) 3CLPro (33). In this previous study, FRET polypeptides spanning the first endogenous cut site between NSP4 and NSP5 (P5-SAVLQSGF-P3′) were generated and modified with every possible single amino acid substitution from P5 to P3′ position relative to the cleavage site. Cleavage efficiency by 3CLPro was then assessed by fluorescence intensity compared with the consensus cleavage sequence. We leveraged this data set to generate a score for every possible cleavage site using a lookup table, multiplying the relative efficiency of each amino acid. This multiplication was then applied with a sliding 8-amino acid windows across the entire human proteome (Figure 1B). Substitution at any site that showed no detectable cleavage was interpreted as “0.” Assuming a glutamine (Q) in the P1 position, over 98,697 scored sites (>0) were identified. Expanding our search to include methionine (M) or histidine (H) at P1 uncovered a total of 195,684 sites with a median score of 0.0008 (Figure 1C and Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/jci.insight.185739DS1). GO analysis of scores in the top 15% (>0.01) showed an enrichment for cell adhesion, morphogenesis, and cytoskeletal genes (Supplemental Table 2). We named the algorithm Sarsport1.0.

To evaluate the accuracy of Sarsport1.0, we calculated scores for the 11 known 3CLPro cut sites in the SARS-CoV viral genome. Scores ranged from 1.31 to 0.04, all within the upper fifth percentile of the score range. These scores were then compared with the published relative Kcat/Km values for each cleavage site (47). With the exception of the cut-site between NSP9 and NSP10 (ATVRLQ*AGNAT), our calculated scores correlated closely with the relative Kcat/Km determined for the remaining 10 known 3CLPro cleavage sites. In contrast, there was essentially no correlation between Kcat/Km and cleavage scores generated by either NetCorona1.0 or a recently reported machine-learning algorithm, 3CLP (Figure 1D) (39).

To evaluate the sensitivity of Sarsport1.0 to identify SARS-CoV-2 host protein targets, we next calculated the scores for the > 100 SARS-CoV-2 3CLPro cleavage targets recently identified via N-terminomics and screening approaches (2628). Sarsport1.0 identified 106 of the 122 cleavage sites, including those with noncanonical methionine or histidine at the P1 position (Figure 1C). The median score was over 0.1, which is within the top 2.5% of all scores. Receiver operator curve (ROC) analysis (Figure 1E) showed Sarsport1.0 to be highly predictive, with an AUC of 0.9449 and P < 0.0001. In comparison, 3CLP identified only 90 sites, in large part due to exclusion of sites with methionine or histidine at the P1 position. While experimentally identified sites were enriched for higher scores (0.97 for experimental sites versus 0.73 for all identified 3CLP sites), 3CLP scores were not as predictive as Sarsport1.0. Receiver operator analysis showed AUC of 0.81 and P value of 0.02. We surmise this is likely due to the fact that 3CLP, like NetCorona1.0, generates likelihood scores based on evolutionary homology to other coronavirus sites, versus relative efficiency of the amino acids at each position. We conclude that Sarsport1.0 is highly predictive of cleavage sites by both SARS-CoV and SARS-CoV-2 3CLPro proteases.

Refinement of cleavage prediction by secondary structure analysis. The unique high score but low Kcat/Km of the NSP9/10 cleavage site (Figure 1D) suggested that a higher order structure, not captured by scoring based on primary sequence alone, might inhibit cleavage. We therefore estimated the secondary structure of each cut site in the viral genome, using the JPRED4 protein secondary structure prediction server (48) and a 100 amino acid (aa) window spanning the P1 position. The NSP9/10 site in SARS-CoV was the only cleavage site where the P1 position (Q) was predicted to lie in a β-sheet (Supplemental Table 3). In contrast, the other sites all lay in predicted α-helices or disordered regions, structures known to be more accessible to proteases (49, 50). These data suggest that higher order structures such as β-sheets hinder cleavage by 3CLPro.

To further probe this possibility, we used JPRED4 to evaluate secondary structures around all predicted cleavage sites with a Q at P1 and with a Sarsport1.0 score > 0.1, adding up to 4,416 sites (Figure 1F and Supplemental Table 4). The recent publication of predicted structures for most of the human proteome with AlphaFold (51) also provides the opportunity to cross-validate secondary with higher order structures. The relative frequency of β-sheet structures at the P1 position of predicted cleavage sites was significantly lower than the overall frequency of β-sheets for Qs in the proteome (52) (Figure 1G), indicating that Sarsport1.0 partly biases away from β-sheets. Secondary analysis of experimentally identified, published cleavage sites revealed an additional significantly increased propensity for cleavage in regions where P1 (Q) is unstructured and in particular not in a β-sheet (Figure 1G). Thus, filtering results from Sarsport1.0 for the absence of a β-sheet structure at P1 will improve its positive predictive value. Interestingly, the median Sarsport1.0 score for sites that lie in unstructured regions (0.1024) was significantly lower than for sites that lie in α-helices (0.1727) or β-sheets (0.27) (Figure 1H), suggesting that the presence of a less permissive secondary structural order imposes a higher evolutionary pressure for an optimal primary sequence cleavage motif.

Cleavage validation of targets. Because of the higher sensitivity of our method, we identified numerous new predicted cleavage sites, in addition to those previously published. Gene Ontology (GO) analysis of proteins with Sarsport1.0 score > 0.01 showed enrichment for cytoskeletal, cell motility, and cell adhesion proteins, including several predicted cleavage sites located within homologous cadherin domains in the cadherin protein superfamily (Supplemental Table 2). Evaluation with AlphaFold predicted these sites to be in unstructured accessible loops within the cadherin domain, thus making them likely to be cleaved if exposed to 3CLPro (Figure 1, G and H). We validated these hits in vitro by incubation of purified 3CLPro with commercially available recombinant cadherin proteins (CDH6, CDH20), which have identical predicted sites (Score 0.145; Q203 and Q209, respectively; Figure 2A). 3CLPro efficiently cleaved both CDH6 and CDH20, yielding the expected fragment sizes based on the predicted cleavage site (Figure 2A). We similarly validated cleavage sites in thrombin (IIA) and the intracellular domain of NOTCH1 as in vitro reactions with both purified proteins yielded the expected fragment sizes (Figure 2, B–D). The appearance of the thrombin IIA cleaved product was inhibited by the 3CLPro inhibitor GC376, demonstrating the requirement of 3CLPro enzymatic activity (Figure 2B). Moreover, cleavage of NOTCH1 at a predicted site (Q2315, score 0.432), located within its intracellular domain, yielded both predicted fragments (Figure 2C). Expression of 3CLPro versus the catalytically inactive C145A mutant in hiPSC-CMs also yielded NOTCH1 fragments of the predicted length, demonstrating cleavage in intact cells (Figure 2D). Additional targets chosen for their high score and secondary structure accessibility (i.e., SVIL, UACA, NOTCH2) were similarly validated with 3CLPro expression in 293T cells, as was the previously published target TAB1 (Supplemental Figure 1A). In a number of these studies, levels of full-length target proteins were reduced by expression of 3CLPro, but the detection of fragments was not always observed, suggesting that cytosolic fragments generated by 3CLPro may be further degraded by endogenous proteolytic pathways. Supporting this notion, the plasma-membrane bound N-terminal cleavage product of full length NOTCH1 yielded the expected 90 kDa fragment, while the C-terminal cleavage site only showed reduction in total protein levels (Supplemental Figure 1B). We conclude that 3CLPro can cleave a wide range of host proteins and that the generated cytosolic protein fragments are likely often degraded by endogenous pathways.

Validation of predicted protein targets.Figure 2

Validation of predicted protein targets. (A) Western blot of in vitro cleavage of recombinant cadherin with 3CLPro. Shown are cleavage sites within the recombinant fragment, amino acid positions displayed for the full length proteins. Western blots show staining against the C-terminus (His-Tag) of each protein and 3CLPro. Recombinant proteins are a mixture of glycosylated (~90 kDa) and unglycosylated (~65 kda), corresponding to cleavage fragments of ~62 kDa and 40 kDa (respectively). For CADH6 cleavage, 2 formulations of 3CLPro were tested (3CLPro unconjugated, 3CLPro-Maltose-Binding Protein conjugated) at 2 concentrations of protease (+, 0.5 μM; ++, 1 μM). For CADH20 cleavage, only unconjugated 3CLPro digests at 1 μM concentration are shown. For 3CLPro and CADH20 staining (His-Tag), samples were run on the same gel but are noncontiguous (as indicated by line separating lanes). (B) Western blot of in vitro cleavage of purified human α-thrombin (IIa). Diagram shows amino acid position of unprocessed prothrombin. Position of cleavage site shown with respect to epitope of antibody used for Western blot. In total, 1 μM of purified 3CLPro was incubated with 2 μg α-thrombin overnight under reducing conditions, with or without the 3CLPro inhibitor GC376 (1 μM). (C) In vitro cleavage of purified recombinant NOTCH1 fragment (aa 2280–2550) with a N-terminal His-Tag. Reactions were done with 1 μM of purified 3CLPro for 1 hour. Diagram shows position of cleavage within the NOTCH1 fragment, with amino acid positions corresponding to the full-length protein. Epitope regions showed for antibody with epitope C-terminal to the cleavage site. Full-length size is ~29 kDa, with N and C-terminal fragments of 4 kDa and 25 kDa (respectively). Samples were run on the same gel but are noncontiguous. (D) Cellular cleavage of NOTCH1. Western blots show lysates of hIPSC cardiomyocytes expressing 3CLPro or catalytically inactive C145A variant for 48 hours. Cleavage site position within the intracellular fragment of NOTCH1 shown, as well as epitope for antibody used in Western blot.

Cardiac targets of SARS-CoV-2 3CLPro show multiple cut sites across sarcomeric proteins. Previous work has demonstrated disorganization of sarcomeres after SARS-CoV-2 infection of hPSC-CMs (38, 44, 45, 53). We hypothesized that 3CLPro may be degrading sarcomere proteins directly. Consistent with this notion, expression of 3CLPro, but not a catalytically inactive mutant (C145A), in hiPSC-CMs led to pronounced sarcomere breakdown within 48 hours (Figure 3A). At this 48-hour time point, we also observed numerous cells with a stereotypical intermediate phenotype, in which sarcomeres exhibited increased length, as defined by the distance between α-actinin stained Z-discs (Figure 3, B and C), suggesting that key structural protein(s) of the sarcomere may be targeted by 3CLPro.

Sarcomere breakdown with 3CLPro expression.Figure 3

Sarcomere breakdown with 3CLPro expression.

Comments (0)

No login
gif