Long-read sequencing reveals the landscape of aberrant alternative splicing and novel therapeutic target in colorectal cancer

Characterization of CRC transcriptome using long-read and short-read sequencing methods

To characterize CRC transcripts with high confidence, we utilized both ONT long-read and Illumina short-read sequencing platforms to resolve CRC transcriptomes (Fig. 1A). Our CRC cohort consists of 78 primary human CRC and 10 adjacent normal biopsies (Additional file 2: Table S1). We obtained an average of 7.92 million ONT long reads with an average read length of 1.02 kb per sample, providing a reliable resource for exploring transcript diversity of human CRC transcriptomes (Table 1 and Additional file 2: Table S2). The schematic representation of the analytical pipeline was constructed (Additional file 1: Fig. S1). Over 90% of the long reads were mappable to the human genome, indicating their high specificity (Additional file 1: Fig. S2a). To validate the ability of ONT reads to recover full-length transcripts, we analyzed the percentage of full-length transcripts of human mitochondrial genes, which have a poly(A) tail and lack introns, resulting in fewer splicing complexities. The results revealed that nearly 60% of ONT reads of the mitochondrial genes were full-length, indicating that the ONT reads can recover most full-length transcripts that do not undergo alternative splicing (Additional file 1: Fig. S2b, c). The results demonstrated that these ONT reads can provide a reliable and valuable resource for analyzing CRC transcriptome complexity.

Fig. 1figure 1

Long-read ONT sequencing identifies novel transcripts in CRC. A Workflow for transcriptome reconstruction based on ONT sequencing data. The FLAIR- SQANTI3 pipeline was adopted to assemble transcripts from long-read ONT data based on GENCODE v34. B Transcript length distribution of different types. C Number of transcripts identified per gene by ONT sequencing. D Percentage of long-read ONT transcripts supported by Short-Read Illumina RNA-seq in 78 CRC and 10 adjacent normal samples. E Percentage of long-read ONT transcripts’ TSSs supported by CAGE peaks, or TTSs supported by 3’-seq peaks or presence of a poly(A) motif

Table 1 ONT long-read sequencing statistics

The aligned reads generated by the ONT platform were then polished and assembled using the FLAIR pipeline [22], yielding an average of ~ 27,000 expressed genes with high confidence (Additional file 2: Table S3). Further quality control and transcript annotations were performed using SQANTI3 [26]. Finally, a total of 90,703 transcripts were identified based on a reference transcriptome (GENCODE v34) with a median transcript length of 2.5 kb (Fig. 1B and Additional file 1: Fig. S2d). These transcripts were mapped to 18,024 annotated genes and 2796 novel loci, and nearly 60% (12,271) of genes have two or more transcripts (Fig. 1C). Among the identified transcripts, 37% (33,913) were full-splice matches (FSM) perfectly matching known transcripts, and 40% (36,489), 17% (15,640), and 0.2% (215) were novel in catalog (NIC; refer to transcripts that containing new combinations of already annotated splice junctions or novel splice junctions formed from already annotated donors and acceptors), novel not in catalog (NNC; refer to transcripts that containing at least one novel junctions), and incomplete-splice matches (ISM; refer to transcripts that matching part of a known transcript), respectively (Additional file 1: Fig. S3a, b). In addition to these four transcript types, we also identified a fraction of other transcripts, such as antisense, genic, fusion, and intergenic types. However, we mainly focused on FSM, NIC, and NNC isoforms, which accounted for the vast majority (> 95%). Novel expressed transcripts (except FSM) account for 43 to 58% of all identified transcripts in individual samples (average = 53%, Additional file 1: Fig. S3a). The intriguing phenomenon is that about half of all identified transcripts are novel (NIC and NNC) in both tumor and normal samples (Additional file 1: Fig. S3b). We randomly selected several novel transcripts and validated them using qRT-PCT (Additional file 1: Fig. S3c). Overall, we identified thousands of novel transcripts in the CRC transcriptome that have not been annotated in the current reference.

To validate the reliability of long-read transcripts, we analyzed short-read RNA-seq data by mapping them to the long-read CRC transcriptome. We found that 84% of the FSM transcripts, 71% of the NIC transcripts, and 55% of the NNC transcripts were validated (Fig. 1D). Several NIC and NNC transcripts were validated using Sanger sequencing method (Additional file 1: Fig. S3d). We next used various orthogonal data to further assess the robustness of these transcripts, including CAGE (Cap Analysis Gene Expression, which measures the 5’ end of transcripts), 3’-seq (which measures the ploy(A) tail of transcripts), and some intrinsic sequencing properties. Benchmarking the novel transcripts against the FSM transcripts, we found that the 3’ ends of novel transcripts were supported by both poly(A) motifs detected by SQANTI3 and bona fide TTS (transcription termination sites provided by 3’-seq assays). The 5’ ends of novel transcripts also showed comparable quality to FSM transcripts (Fig. 1E). For example, our long-read ONT CRC transcriptome identified a novel CDA transcript derived from an alternative TSS (transcription start site), which is supported by proximal CAGE peaks. We also found PPIH transcripts with a novel TTS supported by 3’-seq (Additional file 1: Fig. S3e, f). For some intrinsic sequencing properties, all identified transcripts exhibited an extremely low rate of non-canonical junction usage and reverse transcriptase template switching artifacts (Fig. 1G). Most transcripts (33,271, 58%), particularly novel transcripts, were expressed in only one sample, and 3513 isoforms (6%) were detected in all samples (Additional file 1: Fig. S3g). Notably, rarefaction curve analysis revealed that the discovery of known transcripts reached saturation, yet the discovery of novel transcripts remained unsaturated (Additional file 1: Fig. S3h). Our study provides a high-resolution annotation for the CRC transcriptome.

Characteristics of novel long-read transcripts

Out of the 56,790 novel transcripts discovered, 36,489 (64%) were classified as NNC and 15,640 (27%) were classified as NIC, highlighting the intricate complexity of the CRC transcriptome. Interestingly, we observed a strong positive correlation between the number of exons and the number of novel transcripts, indicating that higher exon complexity leads to an expanded transcript repertoire (Additional file 1: Fig. S4a). Novel transcripts may be derived from new transcriptional start sites (TSSs) and transcriptional terminate sites (TTSs). In fact, novel transcripts with new TSSs were more likely to contain premature stop codons, which are associated with nonsense-mediated mRNA decay (NMD) (Fig. 2A). Furthermore, novel transcripts tended to have more exons but shorter coding sequence. We also observed that NIC transcripts had longer transcripts compared to NNC transcripts (Additional file 1: Fig. S4b-e). The proportion of highly expressed novel transcripts was found to be lower than that of the reference transcripts (Additional file 1: Fig. S4f).

Fig. 2figure 2

Characteristics of long-read ONT novel transcripts. A Percentage of long-read ONT transcripts with TSS 100 bp away from known TSS (left), with TTS 100 bp away from known TTS (center), and with predicted nonsense-mediated mRNA decay (right). B Correlation of transcript-level expression calculated by short-read and long-read sequencing. C The average expression distribution of different types of transcripts, Wilcoxon signed-rank test, *** P < 0.001. D Pathways significantly enriched for genes with an elevated number of novel transcripts detected by long-read ONT sequencing in CRC. Bubble size denotes the number of genes with novel transcripts in each pathway, and color denotes the significance. E Enrichment analysis of oncogenes and tumor suppressor genes with novel transcripts detected by long-read ONT sequencing in cancer. F Number of novel long-read ONT transcripts compared to annotated GENCODE v34 transcripts for selected oncogenes. G Correlation between phyloP score and phastCon score of novel transcripts. H Percent of long-read isoforms predicted to contain an ORF by Transdecoder. I Number of predicted ORFs validated by CPTAC LC–MS/MS proteomics data

We next compared the genes and transcript expression levels generated by long-read ONT and short-read Illumina platforms. As expected, gene and transcript quantifications from long-read data were highly concordant with those from short-read data (Spearman correlation coefficient r = 0.666 for genes and 0.507 for transcripts, respectively; Fig. 2B). Further analyses indicated that the average expression level of novel transcripts (NIC + NNC) was lower than that of known transcripts (FSM) (Fig. 2C). We then conducted pathway enrichment analysis on individual genes with a high gain of novel splice isoforms, selecting those genes with a twofold increase for analysis (see Additional file 2: Table S4). Our results showed that genes with a high gain of novel isoforms are associated with cancer-related signatures, such as RNA splicing, TGF pathway, JAK/STAT pathway, and MYC pathway (Fig. 2D). Notably, oncogenes gained a substantial number of novel transcripts that may play an important role in carcinogenesis and tumor progression. Further analysis indicated that oncogenes were significantly overrepresented in the gene list with a twofold increase, while tumor suppressor genes were underrepresented in this gene set (Fig. 2E). Our analysis revealed that certain oncogenes, including CDH17, EZH2, and ERBB2, exhibited a higher number of novel isoforms (Fig. 2F). Several novel transcript isoforms were validated using qRT-PCR (Additional file 1: Fig. S4g). These findings suggest a potential role for these isoforms in CRC development and progression.

To evaluate the function of the non-coding regions of the genome, which comprise over 98% of the genome and are largely unknown, we assessed the evolutionary conservation of the novel non-coding transcripts. We used phyloP and phastCons scores to perform sequence conservation analysis and found that the novel transcripts had higher conservation scores than random control regions (Fig. 2G). To further characterize the identified novel transcripts that might function as putative protein-coding genes, we evaluated those transcripts with high coding potential. We extracted open reading frames (ORF, length ≥ 300 bp) via sequence homology analysis against the Uniprot database using blastp and functional protein domain prediction using HMMER against the Pfam database [31]. The results indicated that the majority of novel transcripts had the potential to encode proteins (Fig. 2H and Additional file 1: Fig. S5a). To determine whether long-read ONT novel transcripts can encode novel proteins, we compared the predicted ORFs to their closest match in the UniProt database. The results indicated the majority of FSM transcripts encoded ORFs were > 99% identical to the protein in the UniProt database (Additional file 1: Fig. S5b). In contrast, fewer ORFs encoded by NIC or NNC transcripts were annotated in the UniProt database, and a large proportion of ORFs was complete, suggesting that the long-read ONT method may provide much more information for CRC transcriptome.

Beyond transcript annotation, our pipeline leveraged existing CRC proteomics data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) project to validate our novel ORFs. We intersected our data by spectral matching novel ORFs encoded peptides with 97 publicly available CRC samples and 100 adjacent normal samples profiled by TMT10plex MS/MS proteomics. The results showed that 3729 NIC and 817 NNC novel ORFs encoded peptides were supported by LC–MS/MS data. Additionally, we also found 1883 annotated FSM transcripts producing novel ORFs not present in the UniProt database (Fig. 2I and Additional file 2: Table S5). Of all the ORFs with peptide support, we used the DeepLoc [33], a deep neural network-based tool, to predict the subcellular location of these proteins, with about half located in the cytoplasm or nucleus (Additional file 1: Fig. S5c). The TMHMM analyses showed that 606 novel ORFs had transmembrane domains, suggesting the important role of these previously undefined proteins (Additional file 1: Fig. S5d). Although validating novel transcripts with unique peptides is challenging, LC–MS/MS analysis detected the protein products of three selected novel transcripts (MSTRG.373437.7, MSTRG.17227.18, and MSTRG.42052.109) at their isoform-specific junction regions (Additional file 1: Fig. S5e). These results suggested that previously undetected transcripts can encode novel protein isoforms, which might play a role in producing neoantigens in cancer.

We next systematically analyzed the gene fusion based on the long-read ONT sequencing of CRC samples [27]. The results revealed that a total of 4538 fusion events were detected in our CRC cohort, of which the ARHGEF3-CNTNAP2, A2MP1-PTMA, and ACAT2-TCP1 fusion events were considered to be the frequent events in CRC (Additional file 2: Table S6). The majority of identified fusion transcripts have not been detected, which indicates the role of this “dark matter” in CRC, allowing for further investigation.

The landscape of dysregulated AS events in CRC transcriptome

We annotated the AS events based on ONT long-read data using SUPPA2 software [38] and quantified the levels of seven types of splicing events, including skipping exons (SE), alternative 3’ sites (A3), alternative 5’ sites (A5), alternative first exon (AF), alternative last exon (AL), intron retention (RI), and mutually exclusive exon (MX) categories (Fig. 3A). We identified a total of 2,363,976 AS events, with AF events contributing to the majority. Of these, only 565,952 AS events were identified using GENCODE v34 reference (Additional file 1: Fig. S6a). To obtain a reliable AS events dataset, we implemented rigorous filtration criteria (i.e., PSI frequency ≥ 75% and average PSI value ≥ 0.05). As a result, we obtained a total of 1,085,182 AS events, of which 352,238 AS events were identified based on the GENCODE v34 reference (Additional file 1: Fig. S6b, c). The UpSet plot revealed that 6725 genes have all seven types of AS events. It is also noteworthy that > 90% of genes have two or more AS events (Fig. 3B). The various combinations of these AS events largely enriched the diversity of the CRC transcriptome.

Fig. 3figure 3

Identification of alternative splicing events in CRC. A A schematic diagram of the seven types of AS events including A5: Alternative 5’ Splice Site, A3: Alternative 3’ Splice Site, AF: Alternative First Exon, AL: Alternative Last Exon A3, MX: Mutually Exclusive Exon, RI: Retained Intron, and SE: Skipping Exon. B UpSet plot of interactions between the seven types of detected AS events in CRC. Percentage of seven types of AS events detected in CRC (upright). One gene may have up to all seven types of alternative splicing. C The volcano plot visualizing DEAS events between CRC and adjacent normal tissues. The red and blue points in the plot represent the DEAS events with statistical significance. D Enrichment analysis of genes involved DEAS events for hallmark pathways from MSigDB. Each row represents a hallmark geneset/pathway, whereas each column indicates a type of DEAS events. E Distribution of PSI for EIF4H, YWHAE, XPO1, and EEF1B2 in CRC and adjacent normal tissues. P values were calculated by two-sided Mann–Whitney test

To explore the aberrant AS events that occur in CRC. Differential analysis was performed on AS events between tumor and adjacent normal samples, resulting in the identification of 25,002 differentially expressed AS (DEAS) events with an adjusted P-value ≤ 0.05 and |ΔPSI|≥ 0.2 (Fig. 3C and Additional file 2: Table S7). The majority of DEAS events were AF events (8898, 35.59%) (Additional file 1: Fig. S7a). There was no significant difference in the distribution of ΔPSI across seven types of AS events (Additional file 1: Fig. S7b). Unsupervised hierarchical clustering based on PSI values of DEAS events was able to distinctly separate tumor and adjacent normal samples (Additional file 1: Fig. S7c). Next, we conducted gene set enrichment analysis (GSEA) on the parental genes that harbored DEAS events to determine their association with hallmark pathways. The GSEA results showed that genes with DEAS events were enriched in pathways such as G2/M cell cycle checkpoint, E2F signaling pathway, MYC signaling pathway, Mitotic Spindle, and MTORC1 signaling pathway (Fig. 3D). Intriguingly, different types of AS events were significantly enriched in specific pathways. For example, only differentially expressed AF events were overrepresented in the WNT signaling pathway, while the p53 signaling pathway was exceptionally enriched in RI events (Fig. 3D). Furthermore, we identified specific genes involved in MYC signaling pathways that underwent significant AS changes during tumorigenesis. For instance, EIF4H displayed loss of intron retention that was highly specific to CRC. YWHAE showed significant changes in alternative first exon usage between tumor and adjacent normal samples. XPO1 exhibited significant exon skip alterations in tumors compared to adjacent normal samples, while the alternative 3’ change of EEF1B2 showed a significant difference between tumor and adjacent normal samples (Fig. 3E). These findings suggest that DEAS events might act as complicated roles through interacting with multiple cancer-related pathways in CRC.

Splicing factors (SFs) have been documented to play pivotal roles in RNA splicing process, and a substantial portion of DEAS events was directly regulated by the differentially expressed SFs [46]. Among those previously curated 67 SF genes, we found that 17 SF genes were differentially expressed between tumor and adjacent normal samples (Additional file 2: Table S8). We then constructed dysregulation networks by calculating the correlation between expression levels of differentially expressed SFs and DEAS events (Additional file 1: Fig. S8a and Additional file 2: Table S9). In the dysregulation network, we found that 13 SFs (11 upregulated and 2 downregulated SFs in tumors) orchestrated thousands of DEAS events involved in various biological pathways. Notably, ELVAL3 and QKI were observed to modulate a greater number of DEAS events and pathways, indicating that they may play crucial roles in CRC (Additional file 1: Fig. S8b). The pathway enrichment analysis on the DEAS events regulated by SFs indicated that most pathways were significantly associated with cancer metabolisms, such as hypoxia, HIF1A signaling pathway, and Ribosome (Additional file 1: Fig. S8c). Overall, our findings demonstrate that SFs regulate both metabolism-related and other crucial pathways in CRC. In particular, SFs such as ELVAL3, QKI, RBM5, HNRNPA1, and SF1, which are at the center of the regulatory dysregulation network, may play pivotal roles in CRC carcinogenesis and progression.

Clinical outcome associated with alternative splicing events

Due to the highly variable features of AS events, we next explored whether their usage might highlight new prognostic biomarkers in CRC. We performed univariate Cox analysis to assess the correlation between the overall survival rate and each DEAS event. In total, we have identified 1472 DEAS events corresponding to 1241 genes that showed prognostic significance (Additional file 2: Table S10). Of these, 809 DEAS events were associated with a favorable prognosis, while 663 DEAS events were associated with a worse prognosis. Notably, we found 38 genes were identified having both favorable and worse prognostic AS events. The top 30 DEAS events corresponding to 27 distinct genes correlated with overall survival in CRC were shown (Fig. 4A). Intriguingly, four AS events of FHL were associated with overall survival. The skipped exon and alternative first exon of FHL2 were associated with a decreased survival time, while the other two alternative first exon events of FHL2 were associated with prolonged overall survival time in CRC patients (Fig. 4B–D). Overall, our findings suggest that DEAS events may serve as valuable prognostic biomarkers for CRC. The complexity of DEAS events in genes such as FHL2 may lead to specific isoforms during tumorigenesis and progression, highlighting the need for further research to elucidate the functional roles of different isoforms. Our study provides new insights into the close relationship between DEAS events and CRC prognosis, offering a promising avenue for future investigation.

Fig. 4figure 4

Identifications of prognostic DEAS events in CRC. A Top 30 prognostic DEAS events are shown, ranked by differential survival. AS events are labeled with gene name, AS event type, and the number of patients. Each AS event is depicted in the heatmap, including survival prognosis, differential status, and average PSI values. B Gene structure of the known and novel TSS of FHL2 transcript according to the ONT long-read transcriptome of CRC tissues. C Kaplan–Meier survival analysis of the correlation between three types of FHL2 DEAS events and patients’ overall survival rate, log-rank test. D Distribution of PSI for 3 types of FHL2 AF events in CRC and adjacent normal tissues, P values were calculated by Two-sided Mann–Whitney test

To further investigate the dysregulation of alternative splicing in CRC. We analyzed the DEAS events between the advanced stage (AJCC Stage III + IV) and early-stage (AJCC Stage I + II) CRC patients. In total, we identified 998 AS events to be differentially expressed (adjusted P-value ≤ 0.05 and |ΔPSI|≥ 0.2, Additional file 2: Table S11). Furthermore, we analyzed the metastasis-related DEAS, and the result showed that 596 DEAS events were identified to be upregulated in CRC samples with distant metastasis, while 583 DEAS events were found to be significantly decreased (Additional file 2: Table S12). The microsatellite instability (MSI) was considered to be a robust prognostic and predictive biomarker to guide therapeutic decision-making including the use of immunotherapy [47]. We predicted the CRC MSI status based on the transcriptome data by PreMSIm [44], and the results indicated that 20 patients were predicted as MSS/MSI-L status, while the 58 remaining CRC patients were predicted to be MSI-H. Further analysis showed that 744 DEAS events were identified to be upregulated in CRC samples with MSI-H, while 633 DEAS events were identified to be significantly decreased in MSI-H patients (Additional file 2: Table S13), implying that alternative splicing dysregulation may be associated with immunotherapy response. These results suggest that dysregulation of alternative splicing is closely related to CRC progression and therapeutic choice, indicating that it may serve as important biomarkers and potential therapeutic targets.

Besides clinicopathological correlation, we also investigate the relationship between the molecular subtypes and alternative splicing. The consensus molecular subtypes (CMS) is a robust and useful classification system that allows to categorize CRC patients into four subtypes by transcriptome analysis [48]. The CMS classification system is significantly correlated with CRC patient’s overall survival rate and can guide therapeutic decision-making in clinical practice. Firstly, we predicted the CMS for each patient based on the transcriptome data by CMScaller [43]. Then, we analyzed the CMS specific AS events in different CMS subtypes. Each CMS subtype-specific AS event was defined as high level of PSI value in the subtype versus the remaining subtypes (adjusted P-value ≤ 0.05 and |ΔPSI|≥ 0.2). The results showed that a total of 4823 AS events were considered to be CMS subtype specific, of which 1579, 930, 1324, and 990 AS events were found to be highly prevalent in CMS1, CMS2, CMS3, and CMS4 CRC patients, respectively (Additional file 2: Table S14). Regarding CMS subtypes, it is well appreciated that tumor microenvironment including stromal or immunoinfiltration largely affected the CRC prognosis [49]. To analyze the correlation between tumor microenvironment and alternative splicing, we employed the estimate to characterize the stromal and immune cell infiltration [45], and then analyzed the correlation between CRC microenvironment and splicing alterations in CRC samples. The results showed that 237 AS events were found to be significantly correlated with stromal content, while 100 AS events were found to be significantly correlated with immunoinfiltration (adjusted P-value ≤ 0.05 and |Spearman correlation coefficient r|≥ 0.5), suggesting the potential role of alternative splicing in CRC microenvironment remodeling (Additional file 2: Table S15 and Additional file 2: Table S16).

Dysregulation of TIMP1 exon 4–5 splicing in CRC

Among those identified DEAS events, the PSI value of TIMP1 exon 4–5 was significantly elevated in CRC samples (Fig. 3C). The human TIMP1 gene spans approximately 4.4 kb and contains 6 exons. In this study, we found for the first time that TIMP1 gene can generate a new transcript by alternative exon skipping exon 4–5 (TIMP1 Δ4-5) based on our ONT long-read data (Fig. 5A). The PSI value of TIMP1 Δ4-5 was significantly elevated in CRC samples (Fig. 5B). To investigate whether TIMP1 isoforms with (TIMP1-FL) or without of exon 4–5 (TIMP1 Δ4-5) is differentially expressed between CRC and adjacent normal tissues, we analyzed their expression values using Illumina short reads data. The results showed that the mRNA expression level of TIMP1-FL was significantly upregulated in cancerous tissues, while the expression level of TIMP1 Δ4-5 was significantly decreased in CRC tissues (Fig. 5C, D). Moreover, the increased expression of TIMP1-FL was significantly associated with poorer overall survival, while the increased expression of TIMP1 Δ4-5 leads to better survival in CRC patients (Fig. 5C, D and Additional file 1: Fig. S9a).

Fig. 5figure 5

Dysregulation of TIMP1 exon 4–5 splicing in CRC. A Genome browser view depicts the coverage of exons in TIMP1 in representative normal and tumor tissues. B Distribution of PSI value of TIMP1 exon 4–5 skip events in CRC and adjacent normal tissues, P values were calculated by two-sided Mann–Whitney test, *** P < 0.0001. C Expression level of full-length TIMP1 transcript (TIMP1-FL, left) or TIMP1 transcript with exon 4 and 5 exclusion (TIMP1 Δ4-5, right) in CRC and adjacent normal tissues, P values were calculated by two-sided Mann–Whitney test, *** P < 0.001. D Kaplan–Meier survival analysis of the correlation between TIMP1-FL (left) or TIMP1 Δ4-5 (right) expression and patients’ overall survival rate, log-rank test. E The Scheme describes the primer design strategy to characterize the exon 4 and 5 inclusion or exclusion of TIMP1. F Sanger sequencing analysis indicated the exon3-exon6 junction in NCM-460 and SW480 cells. G RT-PCR results of TIMP1 in different CRC cancer cell lines and normal human colon mucosal epithelial cell line. H qRT-PCR results of TIMP1-FL (left) or TIMP1 Δ4-5 (right) expression in CRC cancer cell lines and normal human colon mucosal epithelial cell line. P values were calculated by two-sided Student’s t test, * P < 0.05, ** P < 0.01, *** P < 0.001. I qRT-PCR results depict the ratio of transcripts with or without exon 4–5 in 12 pairs of CRC and adjacent normal tissues. P values were calculated by two-sided Student’s t test, ** P < 0.01

To further validate the splicing role of TIMP1 gene in CRC, we designed three sets of primers. Primer set 1 covered a fragment spanning exon 3 to exon 6 to validate TIMP1-FL and TIMP1 Δ4-5 transcripts (Fig. 5E). Moreover, primer set 2, which covered the exon 3/exon 4 and exon 5/exon 6 junctions, was designed to quantify TIMP1-FL transcript. Primer set 3 covered the exon 3/exon 6 junction and was used to quantify TIMP1 Δ4-5 transcript (Additional file 1: Fig. S9b). Sanger sequencing results confirmed the presence of TIMP1 Δ4-5 isoform in both NCM-460 and SW480 cells (Fig. 5F). Further RT-PCR and qRT-PCR results showed that the expression level of TIMP1-FL was significantly upregulated, while the expression level of TIMP1 Δ4-5 was significantly downregulated in CRC cell lines (Fig. 5G, H). Importantly, we identified the same expression pattern in clinical CRC samples (Fig. 5I and Additional file 1: Fig. S9c). The ratio of exon 4–5 inclusion versus exclusion (TIMP1-FL/TIMP1 Δ4-5 mRNA) was dramatically increased in CRC tissues compared with adjacent normal tissues. The data suggested that dysregulation of TIMP1 exon 4–5 alternative splicing exists in CRC samples.

TIMP1 Δ4-5 and TIMP-FL have antithetical functions in CRC carcinogenesis

It has been documented that TIMP1-FL expression is correlated with tumorigenesis and metastasis of human CRC [23]. To investigate the individual contributions of TIMP1 Δ4-5 and TIMP1-FL in CRC, we overexpressed TIMP1 Δ4-5 and TIMP1-FL to CRC and we overexpressed TIMP1 Δ4-5 and TIMP1-FL in CRC cell lines. The overexpression efficiencies were confirmed by qRT-PCR (Additional file 1: Fig. S9d). Our findings showed that overexpression of TIMP1 Δ4-5 significantly suppressed cell growth, migration, and invasion of SW480 and HCT-8 cells, while overexpression of TIMP1-FL remarkably increased cell growth, migration, and invasion abilities of both SW480 and HCT-8 cell lines reduced cell migration and invasion abilities, which is in concordance with previous study (Fig. 6A–D) [23]. Conversely, we employed siRNA targeting exon 3/exon 6 junction to knockdown TIMP1 Δ4-5 expression in CRC cells. The results indicated that knockdown of TIMP1 Δ4-5 had no effects on TIMP1-FL expression. Knockdown of TIMP1 Δ4-5 significantly increased cell proliferation, migration, and invasion abilities (Additional file 1: Fig. S9e-g). The data suggested that dysregulation of TIMP1 Δ4-5 may function as a suppressor of CRC carcinogenesis.

Fig. 6figure 6

TIMP1 Δ4-5 and TIMP-FL have antithetical functions in CRC carcinogenesis. A Exogenous overexpression of TIMP1 Δ4-5 and TIMP-FL in TIMP1-KD HCT-8 and SW480 cells. B, C Colony formation (B) and MTT (C) assay of exogenous overexpression of TIMP1 Δ4-5 and TIMP-FL in TIMP1-KO HCT-8 and SW480 cells. P values were calculated by two-sided Student’s t test, * P < 0.05, ** P < 0.01. D Tumor cell migration and invasion assay of exogenous overexpression of TIMP1 Δ4-5 and TIMP-FL in TIMP1-KO HCT-8 and SW480 cells. P values were calculated by two-sided Student’s t test, ** P < 0.01, *** P < 0.001. The migrated or invaded cells were quantified by counting in five fields. Scale bar, 100 μm. E Xenograft mouse model established using SW480 cells was stably infected with lentivirus-based TIMP1 Δ4-5 or empty vector in BALB/c nude mice (n = 6 mice per group). In vivo generated tumors are depicted. F, G Analysis of tumor growth (F) and weight (G) in the xenograft mouse model. Data are presented as mean ± SEM of n = 6 mice per group. Two-way ANOVA and one-way ANOVA followed by Tukey test. H Representative H&E and IHC images of randomly selected tumors were shown. Scale bar, 100 µm

We also evaluated the in vivo effects of TIMP1 Δ4-5 overexpression on tumor growth in a nude mouse xenograft model, which showed that cells overexpressing TIMP1 Δ4-5 generated smaller tumors compared to controls (Fig. 6E–H). Collectively, our data indicate that TIMP1-FL and TIMP1 Δ4-5 play distinct roles in CRC carcinogenesis and that TIMP1 Δ4-5 may act as a potential tumor suppressor in CRC.

SRSF1 sustains the exon 4–5 inclusion of TIMP1 in CRC

To better understand how splicing factors regulate TIMP1 exon 4–5 inclusion, we used RBPmap (http://rbpmap.technion.ac.il/) to predict potential splicing factors that bind to TIMP1 pre-mRNA [50]. The result showed that TIMP1 exon 4–5 inclusion or exclusion is regulated by five splicing factors potentially (SRSF1

Comments (0)

No login
gif