Epigenetic Mechanisms Linking Chronic Obstructive Pulmonary Disease and Atrial Fibrillation: A Multi-Omics Mendelian Randomization Study

Introduction

Chronic obstructive pulmonary disease (COPD) and atrial fibrillation (AF) represent two predominant chronic conditions afflicting middle-aged and elderly populations, imposing formidable health and economic burdens on healthcare systems worldwide.1,2 This comorbidity has direct implications for patient outcomes and management, including lower success of rhythm-control strategies, higher bleeding risk with anticoagulation, and increased cardiovascular and all-cause mortality.3,4 These conditions frequently coexist, with epidemiological studies indicating that approximately 13% of AF patients manifest concurrent COPD.5 The incidence of AF among COPD patients exceeds that observed in the general population by 1.23-fold,6 with this association maintaining statistical significance across Asian populations.7 Furthermore, concurrent COPD substantially elevates recurrence rates and adverse event incidence following catheter ablation in AF patients.3 Concurrently, AF patients with comorbid COPD demonstrate significantly heightened risks of hemorrhagic complications, cardiovascular mortality, and all-cause mortality.4 Consequently, investigating COPD-AF comorbidity mechanisms represents a critically important and urgently needed research priority.

Mechanistically, three complementary rationales have been proposed to explain COPD-related AF susceptibility: (i) biological pathways, whereby COPD-associated inflammatory responses and oxidative stress may precipitate structural remodeling8 and electrical conduction alterations,9 thereby amplifying AF susceptibility; (ii) genetic convergence, with hub genes linking COPD and AF, encompassing RPS3, FABP5, and EPHA4,10 alongside common biomarkers—such as CDK8—which may contribute to an adverse cycle via activation of NF-κB-mediated immune-inflammatory cascades;11 and (iii) epigenetic regulation, wherein DNA methylation, as a pivotal modification, may mediate inter-disease associations and provide a tractable layer for mechanistic dissection.12

In recent years, epigenome-wide association studies (EWAS) have furnished novel perspectives for elucidating complex disease molecular mechanisms, with DNA methylation as a pivotal epigenetic modification potentially exerting key roles in inter-disease associations.12 However, previous investigations predominantly remained constrained to association analysis levels, lacking comprehensive exploration of causal relationships. Mendelian randomization analysis predicated on methylation quantitative trait loci (mQTL) can leverage EWAS-identified disease-related CpG sites as exposures, inferring causal associations with health outcomes through genetic instrumental variables, thereby providing robust analytical tools for investigating epigenetic-mediated mechanisms between diseases.13 Simultaneously, integrated analysis of multi-omics data has emerged as a pivotal strategy for revealing complex disease regulatory networks. Cross-validation through DNA methylation, gene expression, and protein abundance levels can furnish more reliable biological evidence.14

Although existing epidemiological, basic research, and observational studies demonstrate associations between COPD and AF, underlying molecular mechanisms remain incompletely elucidated. Therefore, this investigation utilized large-scale population EWAS, quantitative trait loci (QTL), and genome-wide association study (GWAS) datasets, within a summary-data-based Mendelian randomization framework complemented by two-sample Mendelian randomization and colocalization, and employed multi-omics integration strategies to systematically explore causal linkages between COPD and AF across DNA methylation, gene expression, and protein abundance levels. We endeavored to screen candidate genes potentially mediating this association based on integrated omics evidence, aiming to provide novel insights for comprehensive understanding of COPD-AF comorbidity molecular mechanisms and developing innovative prevention and therapeutic strategies.

Materials and Methods Study Design and Data Sources Research Framework

This investigation constituted a comprehensive secondary analysis employing publicly accessible summary-level EWAS, GWAS, and QTL datasets. The analytical framework integrated epigenetic and multi-omics methodologies, utilizing genetic instrumental variables to mitigate confounding influences and combining multi-omics data to identify potential epigenetic pathways mediating COPD to AF associations (Figure 1). The research workflow encompassed: (1) assessment of genetic correlations between COPD and AF; (2) exploration of preliminary genetically predicted causal associations; (3) epigenetic mechanism analysis (core research focus); (4) multi-level integrative validation; (5) functional annotation and therapeutic target prediction.

Figure 1 The overall design of this study.

Abbreviations: COPD, chronic obstructive pulmonary disease; AF, atrial fibrillation; mQTL, methylation quantitative trait loci; eQTL, expression quantitative trait loci; pQTL, protein quantitative trait loci; SMR, summary-data-based Mendelian randomization; PPI, protein–protein interaction.

Data Sources

All EWAS, GWAS, and QTL datasets comprised individuals of European ancestry; dataset-level ancestries are summarized in Table 1 and Supplementary Table S1.

Table 1 Summary of Datasets, Sample Sizes, and Ancestry

i) GWAS Data: COPD GWAS data were procured from Zhou W et al’s comprehensive investigation,15 encompassing 12 cohorts including FinnGen and UK Biobank repositories. The FinnGen cohort utilized ICD-10 codes J43 and J44, ICD-9 codes 492 and 4912, and ICD-8 codes 492 and 49104 for COPD diagnosis. Lifelines Biobank employed spirometry-based diagnostic criteria (FEV1/FVC × 100 < 70). HUNT, MGI, UCLA Precision Health Biobank, and UK Biobank employed phecode-based diagnoses, comprising 58,559 cases and 937,358 controls. AF GWAS data originated from FinnGen R11, utilizing ICD-10 code I48 definitions, encompassing 55,853 cases and 231,952 controls. Right heart-related phenotypic data were derived from UK Biobank cardiac MRI measurements following Pirruccello et al’s methodology, encompassing 41,135 subjects with measurements including pulmonary artery-to-aortic diameter ratio, pulmonary artery-to-aortic diastolic diameter ratio, right atrial maximum area, right atrial minimum area, right atrial area change fraction (RAFAC), right ventricular end-diastolic volume (RVEDV), right ventricular end-systolic volume (RVESV), right ventricular stroke volume (RVSV), right ventricular ejection fraction (RVEF), and pulmonary artery diameter.

ii) Epigenetic Data: COPD-EWAS summary statistics were obtained from published literature sources.16 CpG sites with FDR < 0.05 were defined as “COPD-associated” DNA methylation sites, yielding 155,862 sites for identifying COPD-related DNA methylation CpG sites. Methylation QTL (mQTL) data were sourced from: (1) McRae et al’s investigation (n = 1,980 Europeans);17 (2) Min et al’s (GoDMC) study (n = 27,750 Europeans).18

iii) Gene Expression Data: (1) Expression QTL (eQTL) from the eQTLGen consortium (n = 31,684 Europeans); (2) tissue-specific data from the Genotype-Tissue Expression Consortium (GTEx V8), including atrial appendage tissue (n = 429 Europeans), left ventricular tissue (n = 432 Europeans), whole blood tissue (n = 755 Europeans), and lung tissue (n = 578 Europeans), representing cis-eQTL data exclusively.

iv) Protein Expression Data: (1) Protein QTL (pQTL) from Sun et al (n = 3,301); (2) pQTL from Pietzner et al (n = 10,708 Finns); (3) pQTL from Folkersen et al (n = 30,931); (4) pQTL from Ferkingstad et al (n = 35,559 Icelanders); (5) pQTL from Benjamin B. Sun et al (n = 34,557 Europeans). For MR analysis pQTL data selection: we selected UKB and deCODE database pQTL data, utilizing UKB as the primary analysis when proteins were available, with deCODE data serving as sensitivity validation; proteins exclusively available in deCODE were designated as secondary studies for sensitivity analysis. All original GWAS/QTL analyses were adjusted for age, sex, and the first 10 genomic principal components in their respective studies; summary statistics employed in this investigation were directly retrieved without additional adjustments.

Additionally, we queried UK Biobank cardiac MRI phenotypes for left atrial structural measures (eg, left atrial diameter/volume). These measures were not available as standardized fields in the imaging-derived dataset used here; therefore, left atrial diameter–based analyses were not performed. We also screened UK Biobank and public GWAS resources for pulmonary hypertension, but available datasets had insufficient case numbers to support adequately powered two-sample MR with our instruments; consequently, pulmonary hypertension was not analyzed as an outcome.

Genetic Correlation Analysis

Linkage disequilibrium score regression (LDSC) analysis was executed using the R package “ldscr” (version 4.3.1) to estimate genetic correlation (rg) between COPD and AF based on GWAS summary statistics. LDSC analysis employed the 1000 Genomes Project European population as the reference LD panel, calculating standardized LD scores. P < 0.005 was considered statistically significant genetic correlation, with P values between 0.005 and 0.05 indicating suggestive associations.19

Two-Sample Mendelian Randomization Analysis

Two-sample Mendelian randomization (MR) analyses were conducted using the R package TwoSampleMR (v0.5.7). MR rests on three core assumptions: (i) relevance—the genetic instruments (IVs) are associated with the exposure; (ii) independence—the IVs are independent of confounders of the exposure–outcome relationship; and (iii) exclusion restriction—the IVs influence the outcome only via the exposure (ie, no horizontal pleiotropy). Because horizontal pleiotropy can bias causal estimates, we applied multiple complementary MR estimators to ensure robustness to horizontal pleiotropy and outliers. Specifically, we implemented MR-Egger,20 Weighted Median,21 Inverse-Variance Weighted (IVW),22 Simple Mode,21 and Weighted Mode.23 IVW provides the most precise estimates under no directional pleiotropy; the Weighted Median yields a consistent estimate if ≥50% of the total weight comes from valid instruments; MR-Egger allows for directional pleiotropy via a non-zero intercept under the InSIDE assumption; and mode-based estimators assume that the largest cluster of similar ratio estimates is formed by valid instruments. For mQTL, eQTL, and pQTL analyses, IVW was used when multiple SNP instruments were available, and the Wald ratio was used when only a single SNP instrument was present.

Instrument selection criteria were: (1) genome-wide significant association with the exposure (p < 5×10−8); (2) LD independence (r2 < 0.001 within a 10,000 kb window) with LD clumping and screening performed using PLINK v1.90b7 (64-bit);24 (3) F-statistic > 10 to limit weak instrument bias, with F = (N − 2) × R2/(1 − R2);25 (4) exclusion of SNPs directly associated with AF at genome-wide significance (p < 5×10−8); and (5) harmonization and SNP handling followed TwoSampleMR v0.5.7 defaults (action=2): effect alleles were aligned across datasets; strand issues were corrected by complementing where applicable; palindromic A/T or C/G SNPs with intermediate allele frequency were excluded; and SNPs with unresolved allele conflicts were removed. Outcome variants were additionally filtered to 0.01 < EAF < 0.99. In this analysis, no selected instruments were missing in the outcome dataset, and no LD proxy replacement was applied.

Significant cis-mQTLs were defined as variants located within <1 Mb of CpG sites with P < 1×10−8. For pQTLs, deCODE cis-pQTL data included SNPs within ±100 kb (excluding the MHC region), and UK Biobank pQTL data likewise used ±100 kb windows with MHC region variants removed. Because many CpG-site MR analyses relied on single cis-SNP instruments, these results are primarily hypothesis-generating and should be interpreted cautiously.

We controlled multiple testing using the Benjamini–Hochberg procedure (FDR < 0.05). Heterogeneity across instruments was evaluated using Cochran’s Q (p < 0.05 indicating heterogeneity); random-effects IVW was applied when heterogeneity was present, otherwise fixed-effects IVW was used. Directional (unbalanced) pleiotropy was assessed with the MR-Egger intercept (p < 0.05 indicating pleiotropy), and MR-PRESSO was used to detect and correct outlier SNP effects. Steiger directionality tests were conducted to confirm the exposure→outcome direction and reduce the possibility of reverse causation. Leave-one-out analyses were performed to evaluate the influence of individual instruments on overall estimates. We considered an MR association robust when estimates were directionally consistent across multiple estimators, the MR-Egger intercept did not indicate pleiotropy, no residual MR-PRESSO outliers remained, and fixed- and random-effects IVW results were comparable.

Summary-Data-Based Mendelian Randomization Analysis (SMR)

SMR methodology was employed to initially assess statistical associations between CpG sites and AF at the DNA methylation level utilizing mQTL data. Subsequently, for significantly associated CpG sites, eQTL and pQTL data were utilized to explore associations with AF at gene and protein expression levels. For each QTL type, the most statistically significant cis-acting variant (top cis-variant) served as the instrumental variable. The Benjamini–Hochberg method was applied for multiple testing correction, with FDR < 0.05 considered statistically significant. All significant SMR results (FDR < 0.05) underwent heterogeneity testing (HEIDI) (P_HEIDI > 0.05) to exclude potential linkage disequilibrium (LD) effects. SMR and HEIDI analyses were completed using the online SMR portal (https://yanglab.westlake.edu.cn/smr-portal/, accessed on [2024–11-10]).

Colocalization Analysis

Colocalization analysis was performed for significant sites (p_SMR_FDR < 0.05, p_HEIDI > 0.05) in SMR analyses of mQTL, eQTL, and pQTL with AF utilizing the R package “coloc” (version 5.2.3). Colocalization analysis employed a Bayesian statistical framework to evaluate posterior probabilities of two phenotypes sharing identical causal variants. The analysis considered five hypotheses: (1) PPH0: no association at the locus; (2) PPH1: association exclusively with COPD; (3) PPH2: association exclusively with AF; (4) PPH3: distinct causal variants for the two phenotypes; (5) PPH4: shared causal variant for both phenotypes.26 Default prior parameters (p1 = 1×10−4, p2 = 1×10−4, p12 = 1×10−5) were utilized, executing within ±1 Mb range around CpG sites. Posterior probability PPH4 > 0.70 was considered robust statistical evidence supporting shared causal variants, based on Claudia Giambartolomei et al’s research.27 The “locuscomparer” package (version 1.0.0) generated regional association comparison plots for intuitive visualization of shared signals between phenotypes.

Protein–Protein Interaction (PPI) and Functional Enrichment Analysis

Protein–protein interaction networks estimated correlations among COPD- and AF-related proteins (p_SMR_FDR < 0.05) utilizing the Search Tool for Retrieval of Interacting Genes (STRING, https://string-db.org/)28 and GeneMANIA (https://genemania.org/).29

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses evaluated relationships between potential enriched pathways and proteins. R language and related packages including “GO.db” (v3.18.0), “clusterProfiler” (v4.10.1), “AnnotationHub” (v3.10.1), “org.Hs.eg.db” (v3.18.0), “org.Mm.eg.db” (v3.18.0), “DOSE” (v3.28.2), “ReactomePA” (v1.46.0), “limma” (v3.58.1), “circlize” (v0.4.16), “RColorBrewer” (v1.1–3), “ComplexHeatmap” (v2.18.0), and “ggplot2” (v3.5.1) were employed for enrichment analysis and result visualization.

Drug Prediction and Molecular Docking

To translate the statistical signals into therapeutic hypotheses, we integrated prioritized targets from the MR/eQTL/pQTL and expression analyses into a two-step in-silico workflow: (i) drug prediction to retrieve compounds with known mechanisms acting on the prioritized genes, and (ii) structure-based molecular docking to assess the physical plausibility of drug–target interactions. These in-silico analyses are hypothesis-generating, intended to rank candidates for experimental validation.

Drug Prediction

The Drug–Gene Interaction Database (DSigDB) was utilized to evaluate potential protein–drug interactions, predicting candidate pharmaceuticals targeting genes indicated at gene expression and protein expression levels. DSigDB comprises a comprehensive repository containing 22,527 genomic signatures and 17,389 distinct compounds, encompassing 19,531 gene sets connecting drugs and compounds with their target genes.30 Screening criteria included: (1) drug-gene pairs with established mechanisms of action. For each gene, potential drug numbers, action types (inhibitors, agonists, etc.), and clinical application status were systematically recorded.

Molecular Docking

Molecular docking was subsequently utilized to evaluate predicted interactions between candidate drugs and their targets. This approach facilitates prioritization of drug targets through identifying protein characteristics and protein–drug interaction patterns, providing foundations for future experimental validation.

Component structures were obtained from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/), preserved in SDF format, imported into Chem3D 19.0 software, and energy-minimized using the MM2 module to obtain the lowest-energy dominant conformation, preserved in mol2 format and converted to pdbqt files using AutodockTools 1.5.7. Three-dimensional structures of core target proteins were obtained from the Uniprot database (https://www.uniprot.org/), preserved as PDB structure files, and processed using AutodockTools 1.5.7 to incorporate hydrogen atoms, defining them as receptors and preserving as pdbqt files. The vina plugin was employed for ligand-receptor docking, with visualization analysis performed using PyMOL 3.2 software following docking completion.

Statistical Analysis

All statistical analyses were completed using R (version 4.3.1). Multiple p-value calculations employed the Benjamini–Hochberg method for false discovery rate control, with FDR significance threshold established at 0.05, explicitly reporting effect estimates and confidence intervals.31

Reporting Standards

This investigation adhered to STROBE-MR guidelines, with the STROBE-MR checklist provided in Supplementary Table S49.

The language in this manuscript was polished using Claude sonnet 4 solely for grammatical refinement and clarity enhancement, with all scientific content and interpretations remaining entirely our own.

Results Genetic Correlation Between COPD and AF

LDSC genetic correlation analysis established significant genetic correlation between COPD and AF (rg = 0.193, p = 1.04×10−9), notwithstanding relatively modest heritability estimates for both conditions (COPD: h2 = 0.0161, p < 1×10−50; AF: h2 = 0.0722, p < 1×10−40) (Supplementary Table S2).

Genetic Association and Mendelian Randomization Analysis Between COPD and AF

Two-sample Mendelian randomization (MR) analysis demonstrated that COPD significantly elevated AF risk (OR = 1.156, 95% CI: 1.025–1.304, p = 0.018) (Supplementary Table S3). Sensitivity analyses corroborated this potential causal association (Supplementary Table S4-S5). Following MR-PRESSO exclusion of instrumental variables exhibiting pleiotropy (Supplementary Table S6), COPD remained significantly associated with AF risk (OR = 1.179, 95% CI: 1.073–1.296, p = 0.00065) (Supplementary Table S7 and Supplementary Figure 1), with no horizontal pleiotropy detected (MR Egger intercept test p = 0.696) (Supplementary Figures 2–3) Furthermore, leave-one-out analysis validated result robustness (Supplementary Figure 4) (Supplementary Tables S8–S9). Steiger directionality test precluded reverse causation (p = 3.69×10−19) (Supplementary Table S10).

Epigenetic Analysis of Molecular Mechanisms Underlying COPD’s Impact on AF Identification of COPD-Associated DNA Methylation Sites

Through comprehensive epigenome-wide association study (EWAS), we identified 155,862 CpG sites demonstrating significant associations with COPD (FDR < 0.05) (Supplementary Table S11), ultimately extracting mQTL corresponding to 2,605 CpG sites from McRae mQTL data.

Causal Association Between COPD-Related Methylation Sites and AF

Utilizing stepwise SMR methodology, we established linkage between 109 CpG sites and atrial fibrillation (AF) through genetic instrumental variables (p_SMR_FDR < 0.05), with all sites successfully passing HEIDI testing (p_HEIDI > 0.05), representing single instrumental variable-based analyses. This furnished statistical evidence for potential causal associations whereby methylation status of these 109 sites through genetic regulation might influence AF risk (p_SMR_FDR < 0.05, p_HEIDI > 0.05) (Figure 2A–C and Supplementary Table S12). Among these, 53 CpG sites exhibited positive correlation between methylation levels and AF risk, encompassing ATF5 (cg04315771) and RPS2 (cg02871659); 56 sites demonstrated negative correlation, such as FKBP2 (cg16977872) and VSTM4 (cg07260003). Forty-eight CpG sites demonstrated consistent β-value directions between COPD EWAS data and mQTL-SMR results, whereas 61 CpG sites exhibited opposite directions (Figures 3A–C, 4A–C and Supplementary Tables S13-S14).

Figure 2 Epigenome-wide analysis of COPD-associated DNA methylation sites in relation to AF risk. (A) Manhattan plot of SMR analysis between COPD-associated mQTLs andAF, marking the genes with significant CpG sites (FDR < 0.05). (B) Volcano plot depicting effect sizes (β) and significance levels of SMR results for COPD-associated methylation sites and AF risk. (C) The forest plot shows the top methylation sites that exhibited significant associations when conducting an SMR analysis of COPD methylation and AF. In the plot, small triangular symbols represent odds ratios (OR), and straight lines represent 95% confidence intervals (CI). FDR = false discovery rate.

Figure 3 Integrative analysis of genes with consistent effect directions between COPD EWAS data and mQTL-SMR results. (A) Manhattan plot displaying SMR analysis results of COPD-associated mQTLs with consistent effect directions in relation to AF risk, marking the genes with significant CpG sites (FDR < 0.05). (B) The volcano plot depicts the effect size (β) and significance level of the SMR results for the methylated sites with consistent β values related to COPD and the AF risk. (C) The forest plot shows the main methylated sites that demonstrated significant association when performing SMR analysis on the CpG sites and AF that were directionally consistent with the β values related to COPD. In the plot, small triangular symbols represent odds ratios (OR), and straight lines represent 95% confidence intervals (CI).

Abbreviations: FDR, false discovery rate.

Figure 4 Analysis of genes with opposite effect directions between COPD EWAS data and mQTL-SMR results. (A) Manhattan plot displaying SMR analysis results of COPD-associated mQTLs with opposite effect directions in relation to AF risk, marking the genes with significant CpG sites (FDR < 0.05). (B) The volcano plot depicts the effect size (β) and significance level of the SMR results for the36793methylated sites with opposite β values related to COPD and the AF risk. (C) The forest plot shows the main methylated sites that demonstrated significant association when performing SMR analysis on the CpG sites and AF that were directionally opposite with the β values related to COPD. In the plot, small triangular symbols represent odds ratios (OR), and straight lines represent 95% confidence intervals (CI).

Abbreviations: FDR, false discovery rate.

To validate SMR results, we conducted two-sample MR analysis on 101 of the 109 positive CpG sites amenable to MR analysis (Supplementary Table S15). Among 46 CpG sites with consistent β-value directions between COPD EWAS data and mQTL-SMR results, MR analysis indicated 24 CpG sites demonstrated significant positive or negative associations with AF risk (p_FDR < 0.05), with 11 CpG sites exhibiting significant positive associations (beta > 0) and 13 sites demonstrating significant negative associations (beta < 0) (Supplementary Table S16). Among 55 CpG sites with inconsistent β-value directions between COPD EWAS data and mQTL-SMR results, MR analysis indicated 31 CpG sites demonstrated significant positive or negative associations with AF risk (p_FDR < 0.05), with 18 sites exhibiting significant positive associations (p_FDR < 0.05, beta > 0) and 13 sites demonstrating significant negative associations (p_FDR < 0.05, beta < 0) (Supplementary Table S17). Steiger directionality tests excluded reverse causation possibilities (Supplementary Tables S18–S20).

Colocalization Validation of Methylation-AF Associations

Subsequent colocalization analysis utilizing GoDMC data revealed 25 CpG sites (including cg04510874 in the FES gene region) with PP.H4 > 0.7, suggesting their function as shared causal variant sites linking COPD with AF (Figure 5A–D, Supplementary Figure 5A–5S and Supplementary Table S21), representing novel molecular-level discoveries of association mechanisms. Among CpG sites with consistent β-value directions between COPD EWAS data and mQTL-SMR results, 14 sites demonstrated PP.H4 > 0.7, with 9 sites exhibiting high colocalization probability (PP.H4 > 0.9), encompassing FES (cg04510874), FKBP2 (cg16977872), ATF5 (cg04315771), VSTM4 (cg07260003), and LPIN3 (cg24845165) (Figure 6A–D and Supplementary Table S22). Among CpG sites with inconsistent β-value directions between COPD EWAS data and mQTL-SMR results, 11 sites demonstrated PP.H4 > 0.7 (Figure 7A–D and Supplementary Table S23).

Figure 5 Co-localization analysis of COPD-associated DNA methylation sites in relation to AF risk. (A) Colocalization analysis results for significant SMR findings, with posterior probability (PP.H4) values indicating shared causal variants between methylation sites and AF. (B-D) Regional association plots for the three CpG sites with highest colocalization evidence 35771 (PP.H4 > 0.9), associated with ATF5, FES, and VSTM4 genes, demonstrating shared genetic signals between methylation quantitative trait loci and AF risk.

Figure 6 Co-localization analysis of genes with consistent effect directions between COPD EWAS data and mQTL-SMR results. (A) Colocalization analysis results for methylation sites with consistent β-direction, with higher PP.H4 values indicating stronger evidence for shared causal variants. (B-D) Regional association plots for ATF5, FES, and VSTM4, the three genes demonstrating the highest colocalization evidence (PP.H4 > 0.9), suggesting their roles as key mediators in the COPD-AF causal cascade.

Figure 7 Co-localization analysis of genes with opposite effect directions between COPD EWAS data and mQTL-SMR results. (A) Colocalization analysis results for methylation sites with opposite β-direction, with higher PP.H4 values indicating stronger evidence for shared causal variants. (B–D) Regional association plots for SLC45A3, KLF11, and SLC22A17, the three genes with opposite effect directions showing the highest colocalization evidence, suggesting their involvement in masking effect mechanisms between COPD and AF.

Investigation of the EWAS Atlas website (https://ngdc.cncb.ac.cn/ewas/atlas/index) among the 25 colocalization-positive CpG sites revealed that RPS2 (cg02871659), KLF11 (cg05301188), KMT5A (cg05677062), FKBP2 (cg16977872), and PITPNA (cg20426671) were associated with smoking exposure, whereas remaining CpG sites exhibited no significant smoking associations. All CpG sites were unrelated to harmful particulate matter exposure (PM2.5, PM10).

Multi-Omics Evidence Validation of Important Pathways

Building on pivotal genes discovered at the methylation level, we integrated transcriptomic and proteomic data to triangulate putative mediators linking COPD to AF. We retain detailed eQTL-SMR results in Tables 1–3 in the main text for transparency and readability, with extended statistics in Supplementary Tables S24–S35.

Table 2 SMR Analysis of Gene Expression Associations with AF for Significant COPD-Related Methylation Sites

Table 3 SMR Analysis of Gene Expression Associations with AF for Methylation Sites with Consistent Effect Directions

Gene Expression Level Validation

For genes significant at the methylation level, we performed eQTL-SMR analysis across GTEx and eQTLGen datasets (Table 1 and Supplementary Table S24). Table 1 summarizes representative gene–tissue associations with AF, using FDR control for p_SMR and HEIDI p > 0.05 as consistency with colocalization. At a glance, FES showed inverse associations with AF across eQTLGen and GTEx (atrial appendage, lung); ATF5 showed inverse associations in atrial appendage and left ventricle; VSTM4 showed a positive association in left ventricle; LPIN3, SLC22A17, and SLC45A3 showed positive associations in eQTLGen and/or GTEx lung (Table 2).

To align methylation and expression directions, we stratified genes by whether CpG effects in COPD EWAS and mQTL-SMR pointed in the same (consistent) or opposite (inconsistent) direction. Among genes with consistent directions (Table 3 and Supplementary Table S25), six genes showed expression levels associated with AF risk in at least one tissue. Integrating methylation and expression directions, four genes maintained significant associations, and two were validated by MR. Specifically, in atrial appendage, ATF5 (OR: 0.926, 95% CI: 0.854–0.998) and FES (OR: 0.772, 95% CI: 0.611–0.933) exhibited negative correlations with AF. In left ventricle, ATF5 (OR: 0.932, 95% CI: 0.869–0.996) was negatively correlated with AF, while VSTM4 (OR: 1.098, 95% CI: 1.047–1.149) was positively correlated (Supplementary Tables S26–S27). Colocalization supported shared signals for FES across GTEx atrial appendage, lung, whole blood, and eQTLGen, and for ATF5 in GTEx left ventricle and lung (PP.H4 > 0.9; Supplementary Table S28).

Among genes with opposite methylation–expression directions (Table 4 and Supplementary Table S29), four genes showed expression levels associated with AF in at least one tissue. After integrating methylation and expression directions, two genes maintained significant associations, both showing negative results in MR (Supplementary Tables S30–S31). Colocalization analysis indicated robust evidence for SLC45A3 with eQTLGen (PP.H4 > 0.9; Supplementary Table S32).

Table 4 SMR Analysis of Gene Expression Associations with AF for Methylation Sites with Opposite Effect Directions

Protein Level Validation

At the protein level, ICAM4 abundance was positively associated with AF (pQTL-SMR; MR replicated in deCODE), whereas FES protein levels showed an inverse association with AF in UK Biobank-based MR (Supplementary Tables S33–S35). These protein-level findings converge with the expression-level directions for FES and suggest an orthogonal protein-based signal for ICAM4.

Multi-Level Evidence Integration and Functional Stratification

Integrating evidence across methylation, gene expression, and protein levels, we classified candidates into three functional tiers (Supplementary Tables S24–S35):

Tier 1 (highest): FES—consistent signals across methylation (positive direction), expression (negative), and protein (negative), with colocalization and MR support.

Tier 2 (moderate): ATF5, VSTM4, LPIN3—supported by methylation plus ≥1 additional omics layer or MR consistency in at least one tissue/dataset.

Tier 3 (suggestive/masking): ICAM4, SLC22A17, SLC45A3—signals present but limited to specific omics layers and/or with opposite directions between methylation and expression, suggesting masking or tissue/context dependencies.

Integrative Summary of Candidate Genes (with Clinical Significance)

Table 5 summarizes cross-omics directions, colocalization, MR replication, and tier assignments. Effect sizes correspond to modest odds-ratio shifts typical for complex traits (approximately OR 0.77–1.10 for key signals per SD/log-odds scale), implying small individual-level risk changes but potential population relevance via pathway-level modulation.

Table 5 Integrative Summary of Candidate Genes Across Methylation, Expression, and Protein Layers

Right Heart Phenotype Mediation Analysis

To explore intermediate links in the COPD–methylation–AF pathway, we performed a two-step MR using right-heart imaging phenotypes as candidate mediators (Figure 8 and Supplementary Table S36-S43). First, cg18054172 (HOXC5) methylation was inversely associated with the pulmonary artery/aortic (PA/Ao) diameter ratio (beta = -0.073, p = 1.93×10−4, 95% CI: -0.112 to -0.035), remaining significant after multiple testing (BH-adjusted q = 0.0487). Second, higher genetically proxied PA/Ao ratio was associated with lower AF risk beta = -0.214, SE = 0.0798, p = 0.0072, 95% CI: -0.371 to -0.058), with a consistent weighted median estimate (beta = -0.179, p = 0.046) and a directionally similar but non-significant MR-Egger slope (beta = -0.422, p = 0.36). The combination of these two negative paths implies a positive indirect effect of HOXC5 methylation on AF risk via the PA/Ao ratio. Consistent with the upstream association of COPD with higher HOXC5 methylation (beta = 0.019), the overall COPD→methylation→PA/Ao→AF chain points to a positive mediated effect on AF risk. We note that this inverse PA/Ao→AF direction differs from some observational reports; because the PA/Ao ratio integrates both pulmonary artery and aortic diameters, the MR signal may be influenced by denominator (aortic) variation, and thus should be interpreted cautiously pending component-specific analyses.

Figure 8 Two-step MR schematic linking COPD, HOXC5 methylation (cg18054172), the PA/Ao ratio, and AF. Vertical schematic illustrating the hypothesized chain from COPD to AF via HOXC5 methylation and the pulmonary artery/aortic (PA/Ao) diameter ratio. Links are labeled only with effect direction (Positive/Negative); no effect sizes are shown, and AF effects are on the log-odds scale. In our data, Step 1 (methylation to PA/Ao) is Negative and Step 2 (PA/Ao to AF) is Negative; the product-of-coefficients implies a positive indirect effect of HOXC5 methylation on AF via PA/Ao. Two-sample, two-step MR was used with mQTL instruments for cg18054172, imaging GWAS for PA/Ao, and AF GWAS; inverse-variance weighting provided primary estimates with sensitivity analyses as noted.

PPI and Enrichment Analysis

Protein–protein interaction analysis of genes corresponding to CpG sites with consistent β-value directions between COPD EWAS data and mQTL-SMR results revealed functional interconnectivity including co-expression patterns (Figure 9A). STRING network analysis identified no meaningful functional connections (Supplementary Figure 6). GO enrichment analysis revealed that these genes were predominantly enriched in immune response mechanisms, signal transduction processes, metabolic pathways, developmental differentiation, cellular structural organization, and transport-related biological processes (Figure 9B–E and Supplementary Table S44). KEGG enrichment identified significant signaling pathways including lysine degradation and hormonal signaling pathway regulation (Figure 9F–H and Supplementary Table S45).

Figure 9 Functional analysis of genes with consistent effect directions between COPD EWAS and mQTL-SMR results. (A) Protein–protein interaction network of genes with consistent β-value directions, demonstrating functional connections and co-expression patterns.The purple line in the graph: Networks: Co-expression. (B) Circular representation of GO enrichment analysis for mediating effect genes, highlighting immune response, signal transduction, and metabolic pathway involvement. (C) Gene functional association network based on GO analysis, illustrating interconnections between biological processes. (D) Bar plot of significantly enriched GO terms for mediating effect genes. (E) Bubble plot of GO enrichment results, with larger bubbles indicating more genes associated with each term. (F) Gene functional association network based on KEGG pathway analysis, revealing connections between signaling cascades. (G) Bar plot of significantly enriched KEGG pathways, including lysine degradation and hormonal signaling regulation. (H) Bubble plot of KEGG pathway enrichment, with significance37815level indicated by color intensity.

Protein–protein interaction analysis of genes corresponding to CpG sites with opposite β-value directions between COPD EWAS data and mQTL-SMR results revealed functional interconnectivity patterns (Figure 10A). STRING network analysis identified 63 nodes and 12 functional connections (Supplementary Figure 7), revealing potential collaborative participation in specific biological processes. GO functional enrichment analysis demonstrated these genes were primarily associated with cardiac and vascular development-related pathways, including atrial development, vascular smooth muscle cell differentiation, Notch signaling pathway in cardiac development, and atrial septal development. Additionally, enrichment in steroid and hormone metabolism regulation processes was observed (Figure 10B–E and Supplementary Table S46). KEGG pathway analysis identified significantly enriched signaling pathways including Notch signaling pathway (Figure 10F–H and Supplementary Table S47).

Figure 10 Functional analysis of genes with opposite effect directions between COPD EWAS and mQTL-SMR results. (A) Protein–protein interaction network of genes with opposite β-value directions, demonstrating functional connections and co-expression patterns. Networks (presented as lines in the diagram): Purple: Co-expression; Orange: Predicted; Red: Physical Interactions; Green: Genetic Interactions; Yellow-green: Shared protein domains.Functions (presented by the color inside the circle in the diagram):Red: Iron ion transport; Blue: Transition metal ion transport; Orange: ATPase-coupled transmembrane transporter activity; Purple: Primary active transmembrane transporter activity; Green: Active ion transmembrane transporter activity; Light purple: Phagosome maturation; Turquoise: Intracellular pH reduction. (B) Circular representation of GO enrichment analysis for masking effect genes, highlighting cardiac development and vascular-related pathways. (C) Gene functional association network based on GO analysis, illustrating interconnections between development-related processes. (D) Bar plot of significantly enriched GO terms, showing predominance of cardiac and vascular development pathways. (E) Bubble plot of GO enrichment results for masking effect genes. (F) Gene functional association network based on KEGG pathway analysis, emphasizing Notch signaling pathway connections. (G) Bar plot of significantly enriched KEGG pathways, including Notch signaling and mTOR pathways. (H) Bubble plot of KEGG pathway enrichment for masking effect genes, suggesting involvement in cardiac development regulation.

Figure 11 Molecular docking analysis of candidate drugs with target proteins identified through multi-omics integration. Visualization of molecular docking results for FES protein with four candidate compounds showing high binding affinity: FES-Quinpirole, FES-Emetine, FES-Cephaeline, and FES-Alprostadil. Each panel displays the three-dimensional protein structure with the compound in its predicted binding pose, highlighting key interaction sites.

These functional analysis results suggest COPD might influence AF risk through epigenetic regulation of cardiac development and angiogenesis-related genes, particularly through Notch and mTOR signaling pathway mediation.

Candidate Drug Prediction and Molecular Docking

Based on multi-omics integration-identified 4 moderate-to-high evidence mediating genes, DSigDB database predicted 222 potential therapeutic compounds. FES and ATF5 demonstrated significant associations with the most candidate drugs, including Emetine, Cephaeline, Alprostadil, Quinpirole, Gossypol, Irinotecan, Deptropine, Daunorubicin, Ajmaline, Minocycline, and Chlorhexidine (Supplementary Table S48).

AutoDock molecular docking analysis identified 11 drug–protein interactions displaying high binding affinity (Figure 11, Table 6 and Supplementary Figure 8A-8H). ATF5 with Gossypol [PMID: 3503] exhibited the lowest binding energy (-7.1 kcal/mol), demonstrating the most thermodynamically stable binding conformation (Table 6). Molecular docking results indicated this compound formed 2 hydrogen bonds with ATF5 active sites, suggesting potential for effective ATF5 functional regulation. These findings provide potential targets for developing epigenetic regulatory drugs targeting AF risk in COPD patients, particularly through FES and ATF5-mediated epigenetic modifications.

Table 6 Comparative Analysis of 38837 Binding Energy Scores for All Predicted Drug–Protein Interactions, with Lower Scores Indicating More Thermodynamically Stable Binding Conformations

Discussion

To our knowledge, this is the first study to systematically integrate EWAS with multi-layer QTLs (mQTL, eQTL, pQTL) and causal inference tools (SMR, two-sample MR, HEIDI, and colocalization) to investigate the COPD–AF comorbidity and to prioritize putative mediators by cross-omics convergence. Through SMR, we identified 7 candidate genes with multi-omics evidence, encompassing 4 mediating effect genes and 3 masking effect genes. Among these, FES emerged as the most evidence-supported pivotal mediating gene, validated across multiple levels and demonstrating consistent alterations across DNA methylation, gene expression, and protein levels, providing preliminary human genetic support for its potential as a therapeutic target. Additionally, 7 genes including ATF5, ICAM4, and SLC22A17 function at minimally two regulatory levels, representing genes associated with incident AF in COPD. Compared to FES, these candidate targets should be considered preliminary findings necessitating validation through additional independent investigations. In contrast to prior COPD–AF studies that were predominantly observational or single-omics, our framework requires cross-layer consistency and colocalization (with protein-level replication where available), exemplified by FES convergence, ICAM4 protein replication, and a two-step MR illustrating a right-heart mediation chain (HOXC5 methylation → PA/Ao → AF).

Multiple COPD-related mechanisms may predispose to AF, including blood-gas disturbances (hypoxemia, hypercapnia), sympathetic activation, pulmonary hypertension with right-heart remodeling,32 chronic inflammation/oxidative stress (eg, Nox2) leading to atrial fibrosis,8,33 electrophysiological remodeling (prolonged atrial electromechanical delay, increased P-wave dispersion),9 and medication effects such as prolonged β2-agonist use.34 These observations support a close COPD–AF link and are consistent with our discovery of significant genetic correlation and unidirectional causal associations.

Our MR analysis demonstrated that COPD significantly elevates atrial fibrillation risk, with Steiger directionality testing further corroborating this causal association. Previous studies suggested potential shared genetic susceptibility between COPD and AF,35 consistent with our findings, suggesting that incident AF in COPD patients may involve underlying genetic mechanisms, providing crucial clues for subsequent epigenetic research. However, research findings regarding COPD-AF causal relationships are not entirely consistent. Guangzan Yu et al’s two-sample Mendelian randomization analysis demonstrated no significant causal relationship between COPD and AF (p = 0.991), whereas reverse Mendelian randomization results suggested causal relationships between AF and COPD (p = 0.018).36

Discrepancies in these research findings can be analyzed from both sample characteristics and methodological dimensions. Our investigation integrated multi-cohort data encompassing 58,559 COPD cases, whereas Yu et al utilized smaller-scale European population data with 13,530 COPD cases, representing approximately 4.3-fold increase in sample size and providing substantially higher statistical power. Methodological rigor differences are equally noteworthy, as our investigation employed more stringent methodological strategies, encompassing five complementary MR methods for cross-validation, MR-PRESSO pleiotropy control, and Steiger directionality testing excluding reverse causation. Despite our study’s distinct advantages in sample size and methodology, potential causal associations between COPD and AF may still require corroboration and validation from independent investigations.

Epigenetic studies suggest that DNA methylation in COPD can modulate oxidative stress and inflammation, contributing to atrial fibrosis and electrical remodeling; aberrant methylation has been reported in ion-channel (eg, KCNQ1) and fibrosis genes (eg, TGF-β) in blood and lung tissues.37,38 However, previously published EWAS combined with MR either ignored the sign of CpG β in EWAS data or selected only positive β values,39,40 introducing potential logical or selection biases. Our investigation systematically compared consistency between COPD EWAS data CpG site β values and SMR result β value positivity/negativity, discovering complex regulatory mechanisms of DNA methylation in COPD-induced atrial fibrillation. Through SMR, we identified 109 sites whose methylation status through genetic regulation might influence AF risk. Among CpG sites with consistent β-value directions between COPD EWAS data and mQTL-SMR results, 48 sites were identified, with 20 CpG sites exhibiting positive correlation between methylation levels and AF risk and 28 demonstrating negative correlation. Among CpG sites with opposite β-value directions between COPD EWAS data and mQTL-SMR results, 61 sites were identified, with 33 exhibiting positive correlation and 28 demonstrating negative correlation. However, the 109 CpG sites identified as potentially causally related (putative causal) to AF through SMR screening were predominantly limited by single cis-SNP instrumental variables, requiring further validation through multi-SNP IV or independent datasets.

Long-term smoking and particulate exposure are major risk factors for COPD and can reshape DNA methylation and activate inflammatory/oxidative pathways.37,41–43 Therefore, we systematically validated associations between colocalization-positive CpG sites and environmental factors. Results demonstrated that among 25 candidate sites, only 5 were smoking-related (20%), with none related to harmful particulate matter, indicating that most epigenetic signals reflect disease-specific mechanisms rather than environmental exposure effects, ensuring result robustness. Subsequently combining multi-omics evidence, we ultimately identified 7 important genes functioning in AF risk during COPD development: FES, ATF5, VSTM4, LPIN3, ICAM4, SLC22A17, and SLC45A3, with FES, ATF5, VSTM4, and LPIN3 serving as mediating genes, and ICAM4, SLC22A17, and SLC45A3 serving as masking effect genes.

FES, ATF5, VSTM4, and LPIN3 demonstrated β-direction consistency, suggesting these genes may participate in mediating COPD-induced atrial fibrillation mechanisms. Among these,

Comments (0)

No login
gif