SARS-CoV-2 infection induces a long-lived pro-inflammatory transcriptional profile

Single-cell RNA-seq datasets

The DILfrequency dataset [8] contains single-cell sequencing data of 39 samples from 13 adult participants receiving low-dose IL-2 immunotherapy. For each participant, three longitudinal samples were taken on day 0 (before the first IL-2 injection), day 27 (before the last IL-2 injection) and day 55 (4 weeks after the last IL-2 injection). Each sample contained five major cell types — CD4+ regulatory T cells (Treg), CD4+ conventional T cells (Tconv), CD8+ T cells, CD56bright NK cells and CD56dim NK cells — isolated by fluorescence-activated cell sorting (FACS) from peripheral blood mononuclear cells (PBMCs) and labelled with antibody-derived tags. Only day 0 and day 55 samples were analysed in the current study, with one sample (day 55 sample from participant P8) excluded due to quality control reasons previously described [4]. The DILfrequency dataset also contained cells stimulated with phorbol myristate acetate (PMA)/ionomycin, which were excluded from the current study.

The COMBAT dataset [9] contains single-cell sequencing data from PBMC isolated from 64 hospitalised COVID-19 patients, 13 non-hospitalised (community) COVID-19 patients, 12 critically ill influenza patients recruited from the intensive care unit (ICU), 23 hospitalised patients with all-cause sepsis and ten healthy control participants. PBMCs were classified into different cell types based on their gene expression clusters, protein markers and B- and T-cell receptor sequencing data. Pseudo-bulk samples with less than 2000 total transcript counts or less than 500 IL2-AIS transcript counts were excluded. This resulted in the exclusion of one critical COVID-19 patient and one community COVID-19 patient.

The INCOV dataset [10] contains single-cell sequencing data of 451 samples from 178 COVID-19 patients, with up to three longitudinal samples taken from each patient. Each sample contained PBMCs, with cell type information annotated bioinformatically based on clustering results. Twenty-four samples from ten immunocompromised or immunosuppressed patients were excluded from the current study.

The baseline characteristics of the participants in each study are summarised in Additional file 1: Table S1. The total mRNA counts in each pseudo-bulk sample are summarised in Additional file 2: Table S2.

Generation and normalisation of pseudo-bulk expression data

Pseudo-bulk expression matrices were generated for each dataset by aggregating mRNA counts from cells from the same donor, time point (if applicable) and major cell type. For the DILfrequency dataset, which was based on a custom mRNA panel of 585 transcripts, pseudo-bulk samples with less than 100 total mRNA counts were removed. For the COMBAT and INCOV datasets, which were based on whole-transcriptome sequencing, pseudo-bulk samples with less than 2000 total mRNA counts were removed. The pseudo-bulk expression matrices for each dataset were normalised by dividing raw counts with sample-specific scale factors calculated using the median-of-ratios method previously described [11].

Differential expression analyses

Differential expression analyses were performed separately for each dataset based on the pseudo-bulk expression matrix using DESeq2 [12]. For the DILfrequency dataset, the likelihood ratio test was used, with a full model including time points and the participants as independent variables, and a reduced model including only participants as the independent variable. For the COMBAT dataset, the Wald test was used, with patient groups (COVID-19 or healthy control) as the independent variable. For the INCOV dataset, considering the samples were taken during a wide range of time post COVID-19 symptoms, for each participant, we assigned the earliest sample taken 0–14 days post COVID-19 symptoms as the acute phase sample, and the earliest sample taken 29–84 days post COVID-19 symptoms as the post-acute phase sample. Only participants that have both acute and post-acute phase samples available were included in the differential expression analysis. The cutoff time points for acute and post-acute phase samples were selected to maximise the number of available participants, while minimising the heterogeneity within each group. The likelihood ratio test was used, with a full model including sampling time points (acute or post-acute phase) and participants as independent variables, and a reduced model including only participants as the independent variable. For all datasets, the apeglm method [13] was applied to shrink the resulting fold change values, and the Benjamini–Hochberg FDR correction was applied after pooling all resulting P values.

Deriving the IL-2 induced anti-inflammatory signature score

The Day 55 signature induced by low-dose IL-2 treatment reported previously for the DILfrequency study [4] is referred to here as “Anti-Inflammatory Signature induced by IL-2 treatment (IL2-AIS)”. For the DILfrequency dataset, the IL2-AIS scores were calculated as previously described based on the normalised pseudo-bulk expression levels of the 20 upregulated signature genes (CISH, TNFSF14, OAS1, GIMAP7, GIMAP5, TNFSF10, TAGAP, STAT1, MYC, FASLG, CX3CR1, PTGDR2, CRTAM, EOMES, IL32, CCR10, CCR1, CXCR1, CD40LG and ID3) and the 21 downregulated signature genes (AREG, DUSP5, TNFAIP3, RGS1, CXCR4, DUSP2, DUSP4, DDIT4, NFKBIA, FOSL2, NFKBIZ, ZBTB16, SLC2A3, BTG2, SOX4, OSM, SGK1, TGFBR3, OTUD1, COLQ and CCL5). For the COMBAT and INCOV datasets, a similar approach was applied to calculate the IL2-AIS scores for each pseudo-bulk sample. Specifically, z-scores of normalised expression levels of the 41 signature genes were first calculated for each pseudo-bulk sample within each major cell type. The IL2-AIS scores were then derived as the sum of z-scores of upregulated signature genes, subtracted by that of downregulated signature genes. As the z-scores were calculated from samples within a dataset, one important limitation of this definition was that comparisons of IL2-AIS scores were only allowed within the same dataset, but not across different datasets.

Modelling the dynamics of the IL2-AIS scores in the INCOV cohort

For the INCOV cohort, where multiple longitudinal samples are available for most individuals, we modelled the IL2-AIS scores using a Bayesian linear model to account for the inter-individual variation:

$$_ \sim \mathrm\left(_, \sigma \right)$$

$$_ \sim \mathrm\left(0, 15\right)\;\mathrm\;i=1..168$$

$$_ \sim \mathrm\left(0, 10\right)\;\mathrm\;t=1..7$$

$$\sigma \sim \mathrm\left(0, 5\right)$$

where \(S\) is the observed IL2-AIS scores, \(i\) is the index of the individual and \(t\) is the index of sampling time represented as a categorical variable with seven levels: 0–7 days, 8–14 days, 15–28 days, 29–60 days, 61–90 days, 91–150 days and ≥ 151 days. \(_\) and \(_\) represent the individual-specific effect and the effect of sampling time, respectively. As the IL2-AIS score was formulated as the sum of z-scores of the 41 IL2-AIS genes, \(\mathrm(S)=0\). Therefore, an intercept term for \(_\) was not included. We interpreted \(_\) as the time-adjusted IL2-AIS score of individual \(i\) and \(_\) as the expected IL2-AIS score given sampling time \(t\). Regularising priors were used for \(_\) and \(_\). Posterior mean values and 95% confidence intervals of parameters were estimated using a Markov Chain Monte Carlo approach implemented in Turing.jl [14].

NanoString transcriptomic data analysis

The processed NanoString bulk transcriptomic data for the Gedda et al. (2022) cohort [7] was accessed from Gene Expression Omnibus [15, 16]. Among the 162 convalescent COVID-19 participants, 23 were excluded due to the lack of a precise date of onset of COVID-19 symptoms. All 40 healthy control participants were included. Among the 785 genes profiled in the dataset using the NanoString nCounter Human Host Response panel, 16 IL2-AIS genes were present, including 12 upregulated genes (MYC, CXCR1, OAS1, TNFSF10, FASLG, CCR10, STAT1, CX3CR1, CD40LG, IL32, EOMES and CCR1) and four downregulated genes (OSM, SLC2A3, CXCR4 and CCL5). The IL2-AIS* scores were defined for each sample as the sum of z-scores of normalised transcription levels of upregulated IL2-AIS genes, subtracted by that of downregulated IL2-AIS genes.

STRING network analysis

From the 1419 constituent genes of COMBAT Component 187, the top 50 genes with the highest loading scores were selected for STRING protein interaction network analysis [17]. Given the very strong correlation between the relative contribution of the same core set of genes to both the IL2-AIS and Component 187, the selection of the top 50 genes of Component 187 not only facilitated the visualisation of the gene network, but also allowed to focus on the main biological pathways contributing specifically to the IL2-AIS identified in this study. A gene network and pathway analysis on the full 1419 constituent genes of Component 187 is provided in the COMBAT study [5]. Physical and functional interactions identified from text mining, experimental evidence, annotated databases and co-expression were used. The minimum required interaction score was set to 0.15. The resulting protein interaction network can be accessed using the following link: https://version-11-5.string-db.org/cgi/network?networkId=bU31lXEkAFvX.

Comments (0)

No login
gif