The CAGE profiling results in 45,861 TSS and 20,403 active enhancers (Fig. 1a and Data Availability). The majority of the TSS clusters from the CAGE profiles were located within gene promoter regions (Fig. 1B). However, novel TSS were also found in other locations categories (proximal, 5′-UTR, 3′-UTR, CDS, etc.) As could have been expected, most of the enhancer clusters were detected within the intron and intergenic regions (Fig. 1B). In addition to the cluster annotation, we looked at the expression of the clusters within each genomic annotation category. Fig. 1C indicates that TSS annotated promoters were highly expressed compared to the other TSSs (novel TSSs). As expected, enhancer expression was lower than TSSs. We analyzed the shape of the TSS to find the sharp and broad promoters [23]; using the highly expressed TSSs with TPM ≥ 10, we calculated the Interquartile Range (IQR) and found 9916 broad promoters and 772 sharp promoters (Fig. 1D). The sequence log of two types of the promoters shows that sharp TSSs tend to have a stronger TATA-box upstream of the TSS compared to broad TSSs (Fig. 1E). Finally, we investigated the correlation between the CAGE profiles with eight libraries generated from human whole-blood samples in the Functional ANnoTation Of the Mammalian genome (FANTOM5) project [11, 24]. Pearson’s correlation coefficients indicate a positive correlation in the expression profiles of the FANTOM5 whole-blood samples and the 24 samples from or study as indicated by the correlation heatmap matrix in (Fig. 1F top). Furthermore, Spearman’s correlation between sample from the current study and sample from FANTOM5 was 0.73 (Fig. 1F bottom).
Differential expression analysis of TSSs and enhancersWe found a set of TSS and active enhancer regions differentially expressed (DE) between non-frail and frail (frailty) and male and female (sex) participants (Data Availability). The DESeq2 analysis result revealed 11:45,207 DE TSSs and 72:20,404 DE active enhancers (Padj < 0.05) (Fig. 2A left). The total number of DE regions is 83. When we intersect the 83 DE regions with refTSS database, we found 19 overlapped regions. Likewise, between male and female, we found 89:45,207 DE TSS and 13:20,404 DE active enhancers (Padj < 0.05) (Fig. 2A right). The total number of DE regions is 102. The intersect of the 102 DE regions with refTSS database gave 171 overlapped regions.
Fig. 2Differential expression analysis of the detected TSS and enhancers. A Diagnostic MA plot of the differential expression analysis of TSS and enhancer regions frail vs. non-frail (left) and male vs. female (right). B Heatmap of GWAS-LD enrichment analysis of the differentially expressed TSS and enhancer regions. The labels represent the GWAS trait, and the numbers in the cells are the FDR values. C Expression profile of top differentially expressed TSS frail vs. non-frail. D Mean expression OVCH1-AS1. Error bars represents standard error of the mean. E Expression profile of top differentially expressed enhancer frail vs. non-frail. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001; ns, not significant by a two-way ANOVA test
Functional annotation and GWAS-LD enrichment analysis of the DE regionsAnnotation of genes associated with DE regions between non-frail and frail (frailty differences)The intersection of the DE TSS and active enhancers regions (non-frail vs. frail) with refTSS database resulted in 19 genomic regions. We found four genes associated with 10 regions (9 regions are not associated with any genes) (Supplementary Table 1). OVCH1 Antisense RNA 1 (OVCH1-AS1) is among the associated genes. OVCH1-AS1 is a lncRNA and belongs to the antisense RNAs and has 5 transcripts. Antisense RNAs were found to play a role in regulating gene expression during replication, transcription, and translation [25]. In the NHGRI-EBI Catalog of human GWAS, 10 associations (variant and risk allele) were associated with OVCH1-AS1. Alzheimer disease and age of onset are among the top traits for OVCH1-AS1. The second gene was free fatty acid receptor 2 (FFAR2), a protein coding gene with 3 transcripts. Cell line study found that FFAR2 is expressed in neuronal cells [26]. The study observed that FFAR2 inhibition increases Aβ-Induced neurotoxicity. Using DAVID Bioinformatics tool [16], we found that the cAMP signaling pathway from KEGG was associated with FFAR2 as well as immunity and inflammatory response biological process. Diversion colitis and inflammatory bowel disease (IBD) are two diseases associated with FFAR2 [14]. FFAR2 has 19 associations in GWAS catalog and are related to immune and blood cells. The third gene was Major Histocompatibility Complex, Class I (HLA-K) a pseudogene with 12 associations in GWAS catalog; top trait was macrophage inflammatory protein 1b levels. The fourth gene was the Late Endosomal/Lysosomal Adaptor, MAPK And MTOR Activator 4 (LAMTOR4), a protein-coding gene associated with mTOR signaling pathway, regulation of cell size, and positive regulation of TOR signaling. LAMTOR4 has 12 transcripts and 2 GAWAS associations in GWAS catalog related to venous thromboembolism and Alzheimer disease.
Annotation of genes associated with DE regions between male and female (sex differences)The set of genes associated with DE regions between male and female is listed in Supplementary Table 1. Among the list of the genes is S100A8 (S100 calcium binding protein A8), linked to diseases of immune system and an indicator of macrophage activation [27]. CD44 molecule (Indian blood group) and C-type lectin domain family 2 member B are in the list of genes, and both are linked to innate immune system. We found tetratricopeptide repeat domain 9 genes associated with DE regions, and this gene is linked to mammary gland development pathway. Interestingly, we found ring box protein-1 in the list of genes, ring box linked to invasive bladder transitional cell carcinoma and associated with a poor prognosis and tumor progression in esophageal cancer [28]. Several DE regions are annotated as lncRNA genes like X-inactive specific transcript (XIST) and Testis Expressed Transcript, Y-Linked 14 (TTTY14). We found the ZFX antisense RNA 1 (ZFX-AS1) is associated with DE TSS and enhancers between male and female.
In the Human Ageing Genomic Resources [17], the DE promoter TSS region chr6:366,786,54-366,787,97;+ overlapped all transcripts (n=8) from HGCN cyclin-dependent kinase inhibitor 1A (CDKN1A) gene, a member of human CD1 gene family. CDKN1A (coding for P21), a protein coding gene, is involved in several KEGG pathways (e.g., FoxO signaling pathway, p53 signaling pathway, cellular senescence, and different cancer type pathways) [29]. In response to DNA damage, CDKN1A involved in TP53 mediated inhibition of cellular proliferation [30]. Interestingly, a SNP in CDKN1A was significantly associated with longevity in a cohort from the Bologna group in central Italy [31].
GWAS-LD enrichment analysis of DE TSS and enhancers for frailtyGWAS-LD enrichment of analysis of the 19 DE regions (overlapped with refTSS) (non-frail vs. frail) using the GWAS-LD tool identifies GWAS traits associated with the DE regions (Fig. 2B). Top GWAS traits in (Fig. 2B) were the plantar warts, isovolumetric relaxation time, hemoglobin, and frontal fibrosing alopecia (FDR 0.0053). Other GWAS traits associated with the DE regions are major depressive disorder lifetime and urinary tract infection frequency, a common infection in old adults.
Observed expression variations of the top differentially expressed TSS and enhancersWe looked at expression variations and annotation of the top differentially expressed TSS regions between non-frail and frail participants (Fig. 2C, D) and active enhancers regions (Fig. 2E). The TSS region chr1:207104120-207104377;+, located upstream of the Complement Component 4 Binding Protein Alpha (C4BPA) gene. A cancer cell line study [32] demonstrated that C4BPA was expressed intracellularly in cancer cells and interacts with the NF-κB family member RelA and regulates apoptosis. The same study found that C4BPA mutations are associated with improved cancer survival outcome. The second TSS region was chr2:230927722-230927775;−. This region overlapped Ensemble gene ENSG00000283164, a novel transcript of type lncRNA, antisense to G protein-coupled receptor 55 (GPR55) gene. Experimental evidence [33] and sequence similarity analysis suggested that GPR55 may be involved in hyperalgesia associated with inflammatory and neuropathic pain and may play a role in bone physiology. The third TSS region was chr11:21753419-21753559;+. This region overlapped 56 variants in Ensemble gene variant table. All variants’ consequences are intergenic variants associated with the ncRNA and uncharacterized gene LOC102723370. The fourth TSS region was chr10:133565532-133565621;−. This region overlaps the protein coding gene synaptonemal complex central element protein 1 (SYCE1). SYCE1 belongs to disease-related genes in The Human Protein Atlas [15], and diseases associated with SYCE1 include spermatogenic failure and premature ovarian failure [14]. The fifth TSS region was chr21:46605059-46605167;− overlapping protein coding gene S100 Calcium Binding Protein B (S100B). In the GeneCards database, two diseases are associated with S100B gene, syringoma, and neurofibroma. The sixth TSS region was chr22:22758650-22758750;+. This region overlaps the Immunoglobulin Lambda Variable 2-14 gene (IGLV2-14) involved in adaptive immunity [34].
We focused more on the locus OVCH1-AS1 of type long non-coding and belongs to the antisense (AS) RNA gene group and play a role in the regulation of vascular aging. AS-lncRNA mainly affects the function of endothelial cells (ECs) and vascular smooth muscle cells (VSMCs.) [35]. The barplot in Fig. 2D shows significant difference in the expression of OVCH1-AS1 between frail and non-frail participants. In the GWAS catalog, OVCH1-AS1 overlaps 10 variant and risk alleles. Alzheimer disease, gut microbiota, and alkaline phosphatase are among the reported traits associated with OVCH1-AS.
CAGE also enabled testing whether a gene shows differential TSS usage or not for genes with multiple TSSs. Supplementary Fig. 4 shows alternative TSS usage within genes, most of the genes used only a single TSS, and only a minority of genes used 2 or more TSSs (up to 7 TSSs).
Analysis of the expression of age-related genes between non-frail and frail participantsWe investigated the expression of the previously described age-related genes from a meta-analysis of the multiple age-related gene expression profiles [36]. We tested the correlation of the expression of the age-related genes between non-frail and frail participants; the correlation analysis shows strong Spearman’s correlation for both over-expressed (Fig. 3A) and under-expressed genes (Fig. 3B).
Fig. 3Correlation analysis of the expression of age-related gene between non-frail and frail. Correlation analysis of the age-related gene expression between non-frail and frail subjects. A Over-expressed genes and B under-expressed genes. Source of the genes [36]
Weighted gene co-expression network analysisThe weighted gene co-expression network analysis (WGCNA) enabled the integration of the expression matrix for each sample and their corresponding demographic and clinical laboratory data. The gene expression matrix for all participants was used as input for the WGCNA, and the samples were clustered based on the expression matrix. This analysis did not show any evident outliers in the data. The dendrogram in Fig. 4A was annotated by the frailty variable (non-frail and frail)
Fig. 4Weighted correlation network analysis. A Clustering of sample expression profiles. The dendrogram based on sample Euclidean distance (top) and heatmap of the basic demographic information, frailty data, clinical variable, and BMI. In the heatmap, white means a low value and red means a high value for continuous variables (age, COLHDL, triglycerides, red blood cells, hemoglobin, hematocrit, blood glucose, blood urea nitrogen, serum creatinine, and BMI). For gender, white means female and red means male. For frailty, white means frail and red means non-frail. B Genes clustering dendrogram with the assigned module colors (22 modules). Genes dissimilarity based on topological overlap were computed by blockwiseModules function using automatic network construction and module detection. C Association between module eigengene and demographic and clinical data. Rows correspond to a module eigengene and column to demographic and clinical data. Cells are color-coded by correlation strength. Cell contains the significance trait/module correlation (top number) and P-value (bottom number). Cells with the highest correlation value are set in bold. D Scatterplot of gene significance vs. module membership. Cholesterol HDL (COLHDL) (left) in the light-yellow module and hemoglobin (right) in the back module. Both scatterplots show high significant correlation between the gene significance and light yellow and black module membership
Gene network with 22 modules identifiedThe gene clustering dendrogram with the assigned module colors is shown in Fig. 4B. For the 22 modules, the minimum module significance (the average absolute gene significance measure for all genes in a given module [8]) is 0.1204, mean is 0.1889, and maximum module significance is 0.3959. Furthermore, we investigated the module-trait associations and found significant associations shown by the coefficient correlation in (Fig. 4C and Table 2). As an example, in Fig. 4C and Table 2, the tan color module is significantly associated with age (r = 0.6), while the black color module is significantly associated with red blood cells, hemoglobin, and hematocrit (r= 0.58). To find the association of individual gene in the network and particular traits, WGCNA uses two measures to identify a set of genes with high significance with clinical traits as well the high module member genes in specific module(s), the gene significance (GS), and module membership (MM) (see “Methods”). Fig. 4D shows the scatterplots of GS vs. MM for two modules; the light-yellow module left panel of Fig. 4D for the COLHDL trait shows a strong and highly significant correlation (r=0.66; P-value 4.9e−11). The black module right panel in Fig. 4D shows also strong and significant correlation (r=0.57; P-value 3.2e−29). Furthermore, blue module and royal blue module in Fig. 4D showed the high correlation and strong P-value for frailty and blood glucose, respectively.
Table 2 DAVID bioinformatics tools functional annotation results for selected WGCNA modules. Correlational coefficient values ≥ 0.5 are set in bold. FDR values for enrichment terms are set in bold for FDR < 0.05Functional annotation of the gene modules with strong significance trait/module correlationOut of the identified 22 gene modules from WGCNA, we selected 7 modules with significant trait/module correlation (Fig. 4C) and performed functional annotation of their module member genes. The results of DAVID [16] functional annotation of the genes in 7 gene modules are shown in Table 2. The gene module light cyan was associated with the frailty (non-frail and frail) which was enriched for the KEGG pathway neutrophil extracellular trap (NET formation (FDR < 0.0001). NETs are a DNA scaffold formed as a defense mechanism by neutrophil. NETs contribute to immobilization and neutralization of different microorganisms [37]. NETs play important detrimental and beneficial roles in inflammation, autoimmunity, and other pathophysiological conditions [38].
Distinct gene module identified by WGCNAThe constructed gene network obtained by the WGCNA is visualized in Fig. 5A. The heatmap in Fig. 5A shows the topological overlap matrix (TOM) of all genes. In the heatmap, the 22 modules from the gene network are shown diagonally with very low overlap between them (indicated by blocks of light color). The heatmap of TOM was visualized together with the gene module dendrogram on top and left side of the heatmap and the module colors at the top and left edge of the heatmap. In addition to the TOM heatmap of all gene network, we identified meta-modules (a group of correlated eigengenes) per each trait. The eigengene dendrogram and the eigengene adjacency heatmap in Supplementary Fig. 5 visualize the relationship among the modules and the clinical traits (one dendrogram and heatmap per each trait).
Fig. 5Topological overlap matrix (TOM) analysis. A Heatmap of the topological overlap matrix on the top and left side gene dendrogram tree and module assignment (colored). In the heatmap, light color represents low overlap genes and darker red color represents higher overlap between genes. Each row and column in the heatmap correspond to a single genes. B Multi-dimensional scaling plot of module members. C Barplot of module significance (black, blue, cyan, red, and dark red (five modules)). The module significance is defined as the mean gene significance across all genes in specific module. For each clinical test (serum creatinine, hemoglobin, BMI, and blood urea nitrogen), two barplots are shown for frail (left) and non-frail (right)
To find the distinct module from the 22 gene modules, we used the multi-dimensional scaling plot (Fig. 5B) which revelated that the blue module is the most distinct module with 1045 genes and module significance (the average absolute gene significance measure for all genes in a given module [8]) of 0.171. We analyzed the module significance for five modules and four clinical traits related to frailty status (serum creatinine, hemoglobin, BMI, and blood urea nitrogen) (Fig. 5C). Each panel in Fig. 5C shows the module significance for the frail and non-frail subjects and p-value. The association between the dark red module gene and serum creatinine was the highest in both frail and non-frail participants, and similarly, blue module shows highly significant association, although P-values are somewhat different between frail and non-frail. For the BMI, the blue module has higher gene significance in non-frail compared to frail. Similar consideration applies to blue module correlations within hemoglobin and blood urea nitrogen (Fig. 5C).
Immune system related pathways and GO terms are enriched in the identified gene modulesTo understand the biological meaning and relevance of the identified gene modules from WGCNA, we performed gene set enrichment analysis (GSEA) of the top enriched genes in the blue module using the anRichment R package, which relays on enrichment annotations from several databases (Reactome pathway database, Molecular Signatures Database v7.4, WGCNA internal collection JAM, HD Molecular Signatures–HDSigDB, and Gene ontology for biological process and cellular compartment). The list of all enrichment terms for each module with their P-value, FDR, and Bonferroni’s correction is shown in Supplementary Table 2. The visualization of enrichment terms for the blue module (distinct module) is shown in Fig. 6A. Most of the enriched terms are related to immune system, e.g., the Reactome terms neutrophil degranulation, immune system, and innate immune system and also the enriched GO BB terms myeloid leukocyte activation, neutrophil activation, leukocyte degranulation, myeloid cell activation involved in immune response, cell activation involved in immune response, and others.
Fig. 6Gene set enrichment analysis of blue module and hub genes identification. A Gene set enrichment analysis of blue module for the top enriched term from Reactome pathway database (red labels), Molecular Signatures Database v7.4 (blue labels), WGCNA internal collection JAM (black label), HD Molecular Signatures–HDSigDB (magenta labels), and Gene ontology BP|CC (yellow labels). B Expression profile of the 20 hub genes in the blue module (non-frail vs. frail); data in the bar chart are represented as expression ± SEM. C Protein-protein-interaction network (PPI) for the PYGL (Glycogen Phosphorylase L) protein coding gene and blue module hub gene
Protein-protein interaction (PPI) network analysis of the hub genes in blue moduleWe export the blue module genes to Cytoscape and identified the hub nodes in the network (genes) (“Methods”). Fig. 6B shows the expression profile of the 20 hub genes in the blue module, which indicated that non-frail subjects have higher mean expression of the hub genes compared to the frail subjects. Finally, we looked at the PPI of the top hub gene Glycogen Phosphorylase L (PYGL) as shown in (Fig. 6C) with PPI enrichment P-value < 1.0e−16 a PYGL code for protein that is involved in cAMP-dependent activation of PKA [14].
Members of ETS (erythroblast transformation-specific) family TF enrichmentWe looked at the enrichment of DNA-binding motifs for the differentially expressed TSSs and active enhancers (see “Methods”). Based on motif enrichment scores, the top 20 ranked motifs are shown in Fig. 7A. Among the top ranked motifs, we identified 3 gene members of the ETS transcription factor family [39]. Those are the ETS-related gene subfamily (ERG and FLI1) and prostate-derived Ets transcription factor subfamily (SPDEF) which play key roles as differentiation regulators and tumorigenesis in endocrine organs and target tissues [39]. ERG (ETS Transcription Factor ERG), FLI1 (Fli-1 proto-oncogene), and SPDEF (SAM Pointed Domain Containing Ets Transcription Factor) are all protein coding genes that play a role as a DNA-binding transcription activator. Also, we found that two members of the basic helix-loop-helix proteins (BHLH) were enriched, the Basic Helix-Loop-Helix Family Member E23 (BHLHE23) and myogenic factor 5 (myf) which is a transcriptional repressor. Among the top 20 ranked motifs, we found the Forkhead box protein O1 (FOXO1) a member of the forkhead family of transcription factors known as longevity associated genes by alterations in the insulin/IGF pathway. The enriched FOXO1 TF is a prolog of FOXO3, a gene that carries polymorphisms that have been strongly associated with longevity in genome-wide association studies [14]. Among the top ranked TF was the nuclear factor I X (NFIX) and nuclear factor I B (NFIB) both belonging to nuclear factor I family (NFI). Both NFIX and NFIB play a key role in development and cancer [40]. The SRY-Box Transcription Factor 9 (SOX9) a member of SRY-box transcription factors (SOX) was enriched as well. SOX9 is a protein coding gene and transcription activator which plays a key role in chondrocytes differentiation and skeletal development [41].
Fig. 7Transcription factors biding site prediction and motifs activity analysis. A 20 predicted transcription factor binding sites (TFBS). Each TFBS was ranked based on the P-value and the target colored by the TF family. The first column is the rank, and the second shows the target name, which is either a gene name, an isoform name, or a dimer name. The next column in the plot is the PWM logo, following the motif ID. This ID comes from the MotifDb package. The next-to-last column is the raw affinity score, and the last column is the P-value of motif enrichment. B Venn diagram of the promoter vs. enhancer transcription factors. C Cumulative distribution (cd) of Pearson’s correlation r for the promoters (left) and enhancers (right) between non-frail and frail. The median value of persons’ correlation r on the horizontal axis of each graph. D log2 expression of the selected transcription factors (TF) involved in age-related pathways between non-frail and frail subjects
Motif activity analysis and aging-regulatory signaling pathway analysisMotif activity of promoter and enhancer TF between non-frail and frailTo investigate the motif activity of promoter and enhancer TF between non-frail and frail, we first estimated TF using the identified TSS and enhancers (“Methods”). Overall, we predicted 159 TFs. 125 of them are common for both promoters and enhancers, 24 are promoter specific TF, and 10 are enhancer specific TF as shown in Venn diagram (Fig. 7B).
We calculated the motif activity for promoter and enhancer (“Methods”). The raw calculated motif activity is shown in Supplementary Tables 3 and 4. The motif activity values for promoters and enhancers are plotted with quantile-quantile (Q-Q) plot in Supplementary Fig. 6. We started by estimating Pearson’s pairwise correlation between non-frail and frail for 149 promoter activity and next we repeat the same procedure and tested Pearson’s correlation between non-frail and frail for 135 enhancer activities (Supplementary Table 5). To test the difference in Pearson’s pairwise correlations between non-frail and frail (promoter and enhancer TF), we computed and plotted the empirical cumulative distribution (CDF) (Fig. 7C). The median value of persons’ correlation r on the horizontal axis of each graph in (Fig. 7C) indicates that more than half of the promoter activity of frail subjects negatively correlated (inverse relationship) with the promoter activity of non-frail subjects (mean r = −0.01). Interestingly, among the top ranked promoters, we found that the Engrailed Homeobox 1 and 2 (EN1,2). EN1,2 a protein coding gene has been implicated in pattern formation during development of the central nervous system and among its enriched pathway dopaminergic neurogenesis [42]. Likewise, for the half of the enhancers, the activity in frail subjects is negatively correlated with the enhancer activity of non-frail subjects (mean r = 0.01) of all enhancers.
CAGE expression of the major aging-regulatory signaling pathwaysAfter defining the set of TFs, we looked at the CAGE expression of the major aging-regulatory signaling pathways and their downstream transcription factors which was curated from [43]. The Box-and-whisker plot in Fig.7D shows the variation in CAGE expression for each category of major aging-regulatory signaling pathways between non-frail and frail.
Comments (0)