A gene-expression signature defines a subtype of Stomach Adenocarcinomas with low levels of Claudins and a high ratio of NF-YA long/NF-YA short splicing variants

A novel classification of Claudinlow STADs with the use of the 158-gene signature based on the NF-YAl/NF-YAs expression ratio

We previously classified all STAD tumors in the TCGA database according to the 4 TCGA and ACRG subtypes [29] and complemented this with a fifth Claudinlow subtype defined on the basis of the 24-gene transcriptomic signature described by Nishijima [18].

The number of Claudinlow tumors (Claudinlow: 79) and the other 4 STAD tumor subtypes (EMT:61; MSI:82; MSS/TP53−:104; MSS/TP53+:73) are comparable [24]. As expected, the predicted Claudinlow tumors express significantly lower amounts of CLDN7 relative to the MSI, MSS/TP53−, and MSS/TP53+ (Fig. S1). A similar situation is observed when CLDN4 levels in Claudinlow, MSI, and MSS/TP53− tumors are considered. Surprisingly, no statistically significant difference in the levels of CLDN3, CLDN4, and CLDN7 mRNAs is detected in Claudinlow and EMT tumors. In addition, Claudinlow, EMT, MSI, MSS/TP53−, and MSS/TP53+ tumors show no difference in CLDN3. Overall, our data indicate that the Claudinlow denomination applied to the subgroup of STAD tumors defined by the 24-gene signature is incorrect, as it does not reflect the CLDN3, CLDN4, and CLDN7 expression levels appropriately.

Interestingly, our transcriptomic analyses indicate that the so-called Claudinlow subgroup can be separated from all the other STAD tumor subtypes with the use of the NF-YAl/NF-YAs expression ratio. Indeed, the NF-YAl/NF-YAs median values are significantly higher in Claudinlow tumors relative to the EMT, MSI, MSS/TP53−, and MSS/TP53+ counterparts (Fig. S1). On the basis of this observation, we used the 158-gene signature [24], derived by WGCNA (Weighted Gene Co-expression Network Analysis), as associated with a high threshold value of the NF-YAl/NF-YAs ratio in both BRCA and STAD tumors (Supplementary Table 2). A hierarchical clustering of STAD samples based on the 158-gene signature permits a novel definition of the Claudinlow subset, reducing the original number of Claudinlow tumors from 79 to 56 (Supplementary Table 3 and Fig. 1a). According to the ACRG classification, most of the 79 Claudinlow tumors based on the Nishijima classification originate from the EMT subgroup, while a minor portion derives from other subgroups (Fig. 1b). By contrast, the new classification indicates that all newly identified 56 Claudinlow tumors originate from the EMT subgroup of STADs (Fig. 1b). Figure 1c shows the expression boxplots of the mRNAs coding for CLDN3, CLDN4, CLDN7, NF-YAl, NF-YAs, and NF-YAl/NF-YAs values in our panel of STADs classified according to the 158-gene signature. As expected, Claudinlow tumors show significantly lower levels of the CLDN3, CLDN4, and CLDN7 mRNAs relative to the other subgroups. The exception is represented by CLDN3, whose average expression levels are the same in Claudinlow and EMT tumors. In addition, the Claudinlow specimens present the expected increase of the NF-YAl/NF-YAs ratio, due to a statistically significant increase in NF-YAl levels and a corresponding decrease in NF-YAs (Fig. 1c). Looking at WHO ICD-O-3 categories (International Classification of Diseases for Oncology, 3rd Edition, https://www.who.int/standards/classifications/other-classifications/international-classification-of-diseases-for-oncology), we noticed that Claudinlow tumors were significantly enriched in the mucinous adenocarcinoma fractions (n = 6 out of 18 total, Fisher exact test p value = 1.12 × 10–2), while including a smaller number of tubular adenocarcinomas (n = 2 out of 71, p value = 2.29 × 10–3) (Fig. 1d). Mucinous adenocarcinomas are characterized by an increased invasion capacity, and they are localized in the upper part of the stomach [38].

Fig. 1figure 1

Re-definition of the Claudinlow subtype of the TCGA STAD tumors. a SigClust2 hierarchical clustering of all STAD tumors of TCGA according to our 158-gene signature. Top: heatmap depicting the expression of signature genes within the tumor groups defined by hierarchical clustering. The proposed Claudinlow node is highlighted by a black square. Bottom: the dotplot shows an aggregate view of the signature-genes expression in STAD tumors, as calculated by the median z-score. b Proportion of samples included in the previous (n = 79, Left) and current (n = 56, Right) Claudinlow groups according to their ACRG classification. c Box plots showing the CLDN3, CLDN4, CLDN7, NF-YAl and NF-YAs expression levels as well as the NF-YAl/NF-YAs expression-ratio values of the STAD tumors in the TCGA dataset. The values are shown in CCLE (Transcripts Per Million). Tumor types are defined according to the new molecular classification. d Barplot showing ACRG + Claudinlow subtypes distribution into WHO ICD-O-3 categories for gastric cancer, expressed as percentages of total samples within each subtype. NOS = not otherwise specified

In conclusion, our results support the appropriateness of the 158-gene signature to identify the Claudinlow subset of gastric cancers.

Patterns of genes differentially expressed in the Claudinlow subset of gastric tumors

We used RNA-seq data from the TCGA database to identify DEGs (Differentially Expressed Genes) in the Claudinlow subset of STADs with a “pairwise-comparisons” approach (Supplementary Table 4). The numbers of upregulated and downregulated DEGs determined in each comparison of subsets are shown in Fig. 2a. The minimum number of exclusive DEGs is observed in the Claudinlow/EMT match (1189 upregulated and 1031 downregulated genes). The other Claudinlow matches show a higher number of exclusive DEGs (> 2000 upregulated genes and > 1491 downregulated genes). The upregulated and downregulated DEGs emerging from each comparison can be grouped under several GO (Gene Ontology) terms whose enrichment is characterized by a different degree of significance (Fig. 2b). As for the down-regulated DEGs, the most significant enrichment values are observed for terms describing various mitotic processes. It is worthwhile mentioning that all the comparisons show a significant enrichment of downregulated genes involved in developmental processes, such as establishment of skin barrier, keratinization, epidermis development and epithelial cell differentiation. This is particularly relevant for the comparisons with the MSS/TP53− and MSS/TP53+ subtypes, which are characterized by an epithelial phenotype. As for the upregulated DEGs, the enrichment in genes participating in processes such as muscle contraction, angiogenesis and cell–cell adhesion, which are typical of cells with a mesenchymal phenotype, is of major interest: these processes are particularly enriched in the comparison between the Claudinlow and the EMT tumor subtypes.

Fig. 2figure 2

Differentially Expressed Genes in Claudinlow STAD tumors showing a high NF-YAl/NF-YAs expression ratio. a The upper plots show the number of upregulated (Top) and downregulated (Bottom) genes in Claudinlow samples as compared to the other STAD subtypes. In addition, the Panel illustrates each possible intersection among the different comparisons. b The heatmap depicts the enrichment of the Gene Ontology (GO) terms in upregulated (green) and downregulated (red) genes determined by the comparisons shown in (a)

To better define differences of the 5 subgroups in TCGA, and verify partitioning of the EMT and Claudinlow clusters, we performed Principal Component Analysis (PCA), and the results are shown in Fig. S2a: while the epithelial subgroups are largely clustered together, EMT samples are distinct, and Claudinlow samples are further separated. Fig. S2b shows a Volcano plot of the differences in DEG between EMT and Claudinlow samples, with several genes showing a clear differential expression between the two subgroups. This is reflected in the Gene Ontology and Reactome Pathways plots of up- and down-regulated genes in the two cohorts (Fig. S2c), showing highly enriched terms of cell adhesion and muscle contraction in Claudinlow, and cell division and mitotic cell cycle in EMT.

Use of the 158-gene signature to classify additional STAD cases

We wished to verify the TCGA results using two additional RNA-seq datasets, PRJNA764173 of 231 and PRJNA1119255 of 60 patients, both of Asian origin (Fig. 3a). Exploiting the 158-gene signature, hierarchical clustering of the 291 samples yielded a Claudinlow node with 35 samples (Fig. 3b). We remark that the incidence of this cluster −12% in these datasets is similar to the Claudinlow cluster in TCGA – 13%. These samples show low expression of Claudins and of the epithelial marker E-Cadherin, high levels of NF-YAl and NF-YAl/NF-YAs ratio, as well as of the mesenchymal marker Vimentin (Fig. 3c). Finally, we used our 158-gene signature to characterize 13 Italian patients (PRJEB43867) [25]: median z-score values are positive in 7 cases, negative in 4 cases, and are close to 0 in 2 cases (GC15 and GC6) (Fig. S3a). The 7 cases characterized by a positive z-score value and GC15 present with high levels of NF-YAl and low levels of NF-YAs, which results in a high NF-YAl/NF-YAs value (Fig. S3a). With the exception of CLDN3 in GC15, the expression of CLDN3, CLDN4, and CLDN7 is low in all these samples (Fig. S3b). Altogether, these results confirm the presence of a subgroup of STAD that has low expression of Claudin-3/-4/-7.

Fig. 3figure 3

Claudinlow classification in additional primary tumor datasets. a Pie chart of the gastric cancer samples collected from two independent sources, corresponding to the SRA accessions PRJNA764173 and PRJNA1119255. b SigClust2 hierarchical clustering of primary STAD samples detailed in a according to our 158-gene signature. Top: heatmap showing the expression of signature genes across tumor groups identified by hierarchical clustering, with the proposed Claudinlow node outlined in black. Bottom: Dot plot summarizing the expression of signature genes in STAD tumors, represented by the median z-score. c Box plots display the expression levels of CLDN3, CLDN4, CLDN7, NF-YAl, and NF-YAs, along with the NF-YAl/NF-YAs expression ratio, the correlation with the 158-gene signature, and the marker genes CDH1 and VIM in STAD tumors from the independent datasets PRJNA764173 and PRJNA1119255. Expression values are reported in TPM (Transcripts Per Million), while correlations with the 158-gene signature are represented as median z-scores

Claudinlow samples features unique tumor-stromal interactions

In BRCA, Claudinlow tumors were initially described as exhibiting a pronounced immune and stromal cell infiltration [39]. We decided to assess tumor microenvironment composition with the ESTIMATE algorithm, which employs single-sample Gene Set Enrichment Analysis (ssGSEA) to assign an Immune Score and Stromal Score to each tumor sample based on RNA-seq gene expression [40]. Applying this strategy, we discovered that TCGA STAD Claudinlow tumors were characterized by increased Immune and Stromal Scores compared to epithelial subtypes, while EMT showed intermediate values in the two metrics (Fig. 4a).

Fig. 4figure 4

Claudinlow gastric tumors show distinct microenvironments. a Immune score and stromal score estimated for each gastric cancer subtype. Claudinlow and EMT tumors display the highest immune and stromal infiltration scores. b Comparison of Claudinlow tumors with the broader gastric cancer cohort from the PRJNA764173 and PRJNA1119255 datasets, highlighting elevated stromal scores in Claudinlow tumors relative to the general gastric cancer population. Error bars represent standard errors of the mean

In the second and third dataset of primary gastric cancers, Claudinlow classified samples had a higher Stromal Score than other samples, but a lower Immune Score (Fig. 4b). Together, these results suggest that Claudinlow gastric tumors share with their breast cancer counterparts a distinctive microenvironmental profile marked by elevated stromal content, while immune infiltration appears more variable across datasets.

Insights into the prognostic value of the newly identified subgroup of Claudinlow STADs with a high NF-YAl/NF-YAs expression ratio

To obtain insights into the progression/mortality rates of the 56 Claudinlow tumors and the other subgroups, we performed an evaluation of the clinical data available in the TCGA database. In particular, we compared the PFS (Progression-Free Survival) and the OS (Overall-Survival) curves of the Claudinlow subgroup of patients with the EMT, MSI, MSS/TP53−, and MSS/TP53+. The Kaplan–Meier PFS curves demonstrate that there are no statistically significant differences, whereas the OS curves indicate that Claudinlow shows a significantly lower survival rate/probability than MSS/TP53+ and MSI patients (Fig. 5). On the other hand, there is a lack of significant differences between Claudinlow and EMT cases, both marked by “mesenchymal” phenotypes. Thus, we conclude that the prognoses of Claudinlow and EMT tumors are the worst. Unfortunately, there are no clinical data available for the 231 and 60 patients of PRJNA764173 and PRJNA1119255 classified above. To investigate further the markers above, specifically in Grade IV tumors, we interrogated an available RNA-seq dataset (PRJNA1220682) of gastric cancer with or without peritoneal metastasis. The paucity of the samples precludes the definition of clustering as in Fig. 1 and Fig. 3, but the gene expression data of Fig. 6 shows that, bar one exception, gastric cancers that produced a distant metastasis—the hallmark of stage IV cancer—have lower Claudins and E-Cadherin levels, higher NF-YAl, NF-YAl/NF-Ys ratios, and Vimentin.

Fig. 5figure 5

Clinical outcome of the Claudinlow STAD tumors showing a high NF-YAl/NF-YAs expression-ratio. The Figure shows the Kaplan–Meier survival curves of STAD patients. The Progression Free Survival values (Left) and Overall Survival values (Right) across the STAD molecular subtypes are illustrated. The p-values were determined using the log-rank test

Fig. 6figure 6

The 158-gene signature strongly correlates with gastric cancer that originated peritoneal metastasis. Bar plots show expression of CLDN3, CLDN4, CLDN7, NF-YAl, and NF-YAs, the NF-YAl/NF-YAs expression ratio, the 158-gene signature score, and the epithelial and mesenchymal markers CDH1 and VIM in individual samples of gastric cancer with or without peritoneal metastasis. Expression data are presented as Transcripts Per Million (TPM), while 158-gene signature values are expressed as median z-scores

A novel classification of the STAD cell lines

The bulk RNA-seq generated from the TCGA, 291 stomach-tumors dataset and our cohort of Italian patients define genes expressed not only in the neoplastic cells, but also in tumor-associated immune, stromal, and endothelial cells. Thus, it is important to perform the same type of analyses on STAD cell lines of the CCLE portal for which RNA-seq data are available.

A first classification of the cell lines, based on the ACRG subtypes, was provided by Lee et al. [30]. However, as illustrated in Fig. 7a, this classification leaves 21 cell lines as unclassified, and it does not include the Claudinlow subtype. To obtain a more inclusive classification, we employ the DeepCC deep-learning tool [36] with the Lee et al. categories serving as the training set. Thereafter, we use the ssGSEA (single-sample Gene Set Enrichment Analysis) platform and our 158-gene signature to identify the Claudinlow subgroup, following the ranking of the cell lines according to their z-score values. With the exception of 3 cell lines (RERF-GC1B, NUGC3, and TGBC11TKB), all other lines are classified in specific ACRG subtypes (Fig. 7a). Given the >1 ssGSEA z-score values, 9 STAD cell lines classify as Claudinlow (Supplementary Table 5). With the exception of NUGC2 (MSS/TP53−) and RERF-GC1B (unclassified), these Claudinlow cell lines derive from the EMT subgroup gathered from the DeepCC analyses.

Fig. 7figure 7

A novel classification and characterization of STAD cell lines based on our 158-gene signature and the expression of the CLDN3/CLDN4/CLDN7 and NF-YAl/NF-YAs mRNAs. a The left heatmap shows: (i) the original classification (first column) of the STAD cell lines available in the CCLE dataset cell lines [30]; (ii) a second classification of the STAD cell lines obtained with the application of the DeepCC deep learning tool (second column); (iii) the classification of the STAD cell lines generated following integration with ssGSEA data (third column). To define the Claudinlow cell lines we consider a ssGSEA z-score value > 1. The right panel shows barplots indicating the TPM (Transcripts Per Million) expression levels of the CLDN3/CLDN4/CLDN7, VIM (vimentin), CDH1 (E-Cadherin), NF-YAl and NF-YAs mRNAs. The rightmost barplot shows the NF-YAl/NF-YAs expression ratio values. b The panel illustrates qRT-PCR determination of CLDN3/ CLDN4/CLDN7/VIM/CDH1/NF-YAl/NF-YAs mRNA expression levels in HS746T, LMSU, MKN7, NCIN87, MKN45 and AGS cell lines. The data are expressed as the MEAN ± SD of the values determined. c. Western Blot analysis showing the amounts of the NF-YAl, NF-YAs, VIM and CDH1 proteins in the HS746T, LMSU, MKN7, NCIN87, MKN45 and AGS cell lines

In parallel, we evaluated the expression levels of the mRNAs encoding CLDN3/CLDN4/CLDN7, the EMT marker VIM (Vimentin), the epithelial marker CDH1 (Cadherin-1) and the NF-YAl/NF-YAs (Fig. 7a). As expected, the Claudinlow cell lines are characterized by low/undetectable levels of the 3 claudins. The only exception is represented by the NUGC2, HS746T and RERF-GC1B cell lines, which express significant amounts of the CLDN4 transcript. The EMT and MSI cell lines characterized by positive z-score values show low CLDN3, CLDN4 and CLDN7 levels. By converse, the MSS/TP53− and MSS/TP53+ cell lines, with the exception of FU97, present with negative z-score values and high levels of the 3 claudins. The VIM mRNA is expressed in 7 of the 9 Claudinlow, 2 of the 5 EMT and in 1 of the 5 MSI cell lines. The CDH1 transcript is expressed in 2 Claudinlow cell lines presenting with undetectable VIM mRNA levels (NUGC2/RERFGC-1B), one EMT (MKN7) and one MSI (NCC59) cell line (Fig. 7a). As expected, the majority of the cell lines characterized by an “epithelial” phenotype express large amount of CDH1 mRNA. Finally, the expression of NF-YAl and NF-YAs is associated with two separate “high-to-low” and “low-to-high” TPM (Transcripts per Millions) gradients in all the STAD cell lines, with the exception of the MSI SNU1 cell line.

We validated the RNA-seq data by performing qRT-PCR studies to measure the amounts of CLDN3, CLDN4, CLDN7, VIM, CDH1, NF-YAl, and NF-YAs in several STAD cell lines (Fig. 7b). With the exception of CLDN4 in the HS746T cell line, the qRT-PCR data indicate that the selected Claudinlow cell lines lack CLDN3, CLDN4, and CLDN7 expression. In line with their epithelial characteristics, the MSS/TP53−NCIN87 and MKN45 cell lines express CLDN3, CLDN4, and CLDN7, while the MSS/TP53+ AGS cell line expresses only CLDN4 and CLDN7. Surprisingly, high CLDN3, CLDN4, and CLDN7 expression levels are also observed in the EMT MKN7 cell line. As for VIM, CDH1, NF-YAl, and NF-YAs, the qRT-PCR and the RNA-seq data are entirely consistent. Overall, the qRT-PCR results confirm the RNA-seq analyses, with a single inconsistency represented by CLDN3 expression in the MKN7 cell line.

The intra-cellular expression of a specific mRNA does not necessarily translate into the production of the encoded polypeptide. We address this point in the case of NF-YAl, NF-YAs, VIM, and CDH1 transcripts, as the presence/absence of the corresponding protein may be of functional importance for the homeostasis of the neoplastic cell. Thus, we determined the levels of the NF-YAl, NF-YAs, VIM, and CDH1 proteins in the STAD cell lines used for the qRT-PCR validation experiments (Fig. 7c). The results obtained indicate that the amounts of the NF-YAl, NF-YAs, VIM, and CDH1 proteins determined in each cell line reflect the expression levels of the corresponding transcripts in our qRT-PCR. Specifically, both the qRT-PCR and RNA-Seq data indicate that CDH1 is absent in the AGS cell line, despite its “epithelial” classification. On the other hand, we note that the only cell line classified as EMT expressing CDH1 is the MKN7 that we use here. We conclude that Western Blot Analysis is in line with the qRT-PCR shown above.

Comments (0)

No login
gif