Evolutionary and structural insights into DNMTs and TETs: decoding their functional heterogeneity and oncogenic roles in methylation regulation

Evolutionary differentiation and structural characteristics of the DNMT and TET families in mammals

DNMT mediates DNA methylation in mammals. TET mediates demethylation mainly by converting 5mC into 5hmC, 5fC, and 5caC (Fig. 1A). Phylogenetic analysis of DNMT and TET families across seven mammalian species reveals distinct evolutionary trajectories (Fig. 1B). Separate evolutionary trees of mammalian TET and DNMT proteins were constructed (Fig. S1), and additional trees focusing on human TET and DNMT proteins were also generated (Fig. S2). From the evolutionary tree, it can be seen that DNMT1 and DNMT2 proteins show a close phylogenetic relationship compared to DNMT3A, DNMT3B and DNMT3L. DNMT1 is primarily involved in the maintenance of DNA methylation during replication, while DNMT3A and DNMT3B are implicated in de novo methylation processes. The divergent domain architectures of these proteins—characterized by DNMT1 possessing zinc-finger-CxxC-domain (CXXC), replication focus targeting sequence (RFTS), and bromo adjacent homology domain (BAH) domains, and DNMT3A/B containing Proline-Tryptophan-Tryptophan-Proline (PWWP) and ATRX-DNMT3-DNMT3L domain (ADD) domains—underscore their functional specialization (Fig. 1C). Similarly, TET1 and TET3 exhibit closer evolutionary kinship than TET2, implying conserved regulatory roles across species.

All of the DNMTs utilize S-adenosylmethionine (SAM) as a methyl donor and employ a base-flipping mechanism to reposition the target base into the catalytic pocket of the enzyme [24]. Consequently, the C-terminal catalytic domain is crucial for DNMT functionality (Fig. 1D). The TET protein functions as a dioxygenase dependent on iron (II) and α-ketoglutaric acid (Fe (II)/α-KG). Its carboxy-terminal core catalytic domain is characterized by a double-stranded β-helix (DSBH) domain and a cysteine-rich domain [25]. The DSBH domain facilitates the clustering of Fe (II), α-KG, and 5mC together for the oxidation process, while the cysteine-rich domains encase the DSBH core to stabilize the overall structure and TET-DNA interactions [15] (Fig. 1D).

Fig. 1figure 1

The epigenetic modifications and structural features of DNMT and TET proteins. (A) Chemical reactions depicting the conversion of cytosine bases in DNA by DNMTs and TETs, showing the formation of 5mC, 5hmC, 5fC, and 5caC. (B) Circular phylogenetic tree illustrating the evolutionary relationships among different DNMT and TET proteins across various species, with colored ranges indicating different protein types. The tree was constructed using the MLE method (Bootstrap = 500) (C) Domain architecture diagrams of DNMT and TET proteins, detailing the lengths (in amino acids) and functional domains such as CXXC, PWWP, DMAP1-binding, BAH, ADD, CRD, RFTS, Catalytic domain, Low complexity insert, and DSBH. (D) Structural models of DNMT and TET protein, which are predicted by AlphaFold, highlighting the catalytic domain and other structural elements with different colors for better visualization

Conserved catalytic structures and DNA-interacting features of DNMT and TET proteins

Notwithstanding their divergent catalytic roles, the DNMT and TET protein families engage with DNA, thereby enabling an investigation into structural similarities via conserved sequences within their protein domains.

In our study of human genes that encode DNMT and TET proteins, we have identified an evolutionarily conserved sequence within the DNMT family (Fig. 2A). We labelled the critical binding positions of this conserved region. The 1226 C mutation of DNMT1 to A is inactive [26]. Its corresponding position, C710 of DNMT3A, is important for exerting methylation [27]. G1256-T1273 and G1288-D1312 of TET2 can form a shallow groove that interacts with DNA, and in addition, the H-D/E-H binding site in the TET family, which is essential for binding ferrous ions, was labelled by us [25]. The three-dimensional structure of DNMT3A exhibits a conserved region characterized by six β-sheet structures interconnected by α-helices (Fig. 2B). Similarly, the TET C-terminal catalytic domain features ten conserved and five semi-conserved β-sheets, including DSBH domain elements (Fig. 2B). These structures form rigid planar architectures stabilized by interchain hydrogen bonds, conferring structural stability.

Consistent with previous studies, the present results suggest that the conserved β-strands of DNMT and TET proteins are essential for the integrity and activity of their catalytic domains [25]. Variant residues are critical for substrate coordination and catalytic activity. Structural modeling suggests β-sheets mediate DNA interactions through base/phosphate-group contacts via dedicated residues, while adjacent α-helices modulate structural polarity and flexibility without compromising β-sheet-derived stability.

Moreover, both protein families demonstrate a significant presence of basic amino acid residues within their catalytic regions (Fig. 2C). This positively charged residue-rich environment likely facilitates interactions with negatively charged DNA, enhancing substrate binding and catalytic efficacy. In summary, the conserved structural architecture of DNMT and TET catalytic domains underpins their functional versatility and efficacy in engaging with DNA.

Fig. 2figure 2

Structure and sequence analysis of DNMT and TET proteins. (A) Sequence comparison of DNMT and TET proteins with conserved residues highlighted in yellow (relatively conserved) and red (more conserved). Purple box: critical region for substrate binding. Sky blue dots: DNMT key residues. Dark blue dots: Residues of TET-bound Fe ions. Brown arrows: β-sheet, light blue columns: α-helix. The source of this figure is shown in the Uniprot number of the human in the supporting data. (B) Domain schematic diagrams of DNMT and TET proteins, with arrows indicating β-sheet structures. Pale green: β-sheet structure of DNMT. Pale pink: β-sheet structure of TET. Structures shown are derived from AlphaFold predictions. The AlphaFold predicted structures of DNMT3A and TET2 are used in the structural presentation here. (C) Surface representations of DNMT1, DNMT2, DNMT3A, TET1, TET2, and TET3 proteins, colored by electrostatic potential (blue for positive, red for negative). The charge value range is displayed magnified. All the protein structures used above are predicted by AlphaFold

Analysis of structural differences between DNMTs and TETsTETs exhibit higher loop content and structural flexibility compared to DNMTs, reflecting distinct epigenetic roles

Using AlphaFold-predicted data, we calculated the secondary structure percentages of the TET and DNMT families. The calculations were based on full-length protein structures, including catalytic domains and intrinsic disordered regions (IDRs). Quantitative secondary structure analysis showed that TET families exhibit more loops than DNMT families (Fig. 3A). We attribute the high loop proportion to the presence of extensive IDRs and flexible linkers. Additionally, we separately calculated the secondary structure percentages of individual members of TET and DNMT in experimental data (partially for catalytic domains) and predicted data (Supporting Data 1 and 2). Furthermore, secondary structure percentages of TET and DNMT families from experimental data were calculated. Considering that DNMT2 may play a less prominent role in vivo, we also computed the mean values for DNMT1/3A/3B/3L only (Supporting Data 3). In all the above results, TET consistently showed a higher proportion of loop structures than DNMT, leading us to speculate that high loop content might be a structural feature of TET.

The structure of a protein determines its function, and its three-dimensional conformation directly affects its specific binding and biological activity with substrates, ligands, or other molecules. TET protein catalyzes the oxidation of 5mC, which involves multiple reaction intermediates and depends on conformational flexibility, provided by its cyclic structures. Conversely, DNMT proteins maintain stable DNA methylation, necessitating strong binding to DNA, which correlates with a lower loop ratio. Notably, TET proteins also have low-complexity insertion regions that may contribute to their higher loop structure.

Varying evolutionary pressures have shaped the structural adaptations of these protein families. TET proteins likely underwent extensive changes to meet the complexities of DNA demethylation and epigenetic regulation, resulting in an increased loop structure proportion. Conversely, DNMT proteins have exhibited a relatively conserved function to maintain essential functions in DNA methylation across species, resulting in a stable loop ratio.

Structural analyses of protein-DNA complexes indicate that DNMT structures feature multiple helical components within their DNA-binding regions. In TET proteins, the DNA-bound region consists mainly of loop and β-fold structures, while the area behind the DNA-binding site contains helical structures (Fig. 3B). This variation highlights the distinctions in secondary structural components. DNMT proteins necessitate stable binding at CpG sites, rendering helical structures essential for effective methylation. In contrast, the flexibility provided by TET’s loop and β-fold structures allows for adaptation during the oxidative demethylation process. Nevertheless, the helical structures behind the DNA-binding site in TET proteins provide a level of stability that is critical for their diverse interactions.

Distinct conformational flexibility of CXXC domains suggests distinct DNA binding mechanisms between DNMTs and TET proteins

The analysis presented indicates that DNMT1, TET1, and TET3 all contain the CXXC domain. However, there are significant distinctions among these proteins [28, 29].

Subsequent structural analysis of the CXXC-DNA complexes revealed variations in both structural morphology and differences in conformational rigidity among the CXXC domains of different proteins. Specifically, B factor analysis revealed that the CXXC domain of DNMT1 exhibited uniform flexibility distribution, while the CXXC domain of TET1 and TET3 exhibited significant flexibility differences across different regions. These disparities in flexibility may have ramifications for their interactions with DNA and other factors (Fig. 3C). These results suggest that the three proteins differ in structural rigidity and conformational plasticity. However, it can be noted that the B factor itself is not directly indicative of DNA binding affinity, unless it is based on structural data of the complex. Consequently, this study merely postulates that alterations in the B factor may potentially impact the binding mode with DNA or other factors. However, discrepancies in binding affinity require experimental validation to ascertain their precise nature.

The CXXC domain of DNMT1 may exhibit relatively low conformational flexibility, which may support more stable DNA interactions. The CXXC domain of TET1 is relatively flexible and lacks overt DNA binding activity. However, this flexibility allows TET1 to be recruited to specific chromatin regions through interactions with other proteins (such as NANOG). Although the CXXC domain of TET1 does not directly bind DNA, its presence is essential for the biological function of TET1 [30]. The CXXC domain of TET3 exhibits remarkable conformational stability, attributable to its elevated rigidity. This property facilitates its precise anchoring to the gene promoter region, ensuring sustained residence and efficient execution of demethylation reactions [31].

Structural domain divergence within the DNMT family suggests functional specialization and substrate preferences

DNMT1 possesses a conserved C-terminal catalytic domain alongside a distinctive N-terminal regulatory domain, which encompasses functional subdomains such as DNMT1 binding, PCNA binding, NLS, RFTS, CXXC, and two BAH domains [32]. In contrast, DNMT3A and DNMT3B play essential roles in de novo methylation in mammals, with DNMT3L enhancing their activity as a cofactor. Both DNMT3A and DNMT3B share structural similarities, consisting, from N to C terminus, of a variable region for context-dependent regulation, a PWWP domain for initial DNA anchoring, cysteine-enriched regions for stability, and a C-terminal catalytic region [33]. Notably, DNMT3A exhibits a preference for specific CpG sites, while DNMT3B is more active at non-CpG sites, particularly in regions associated with ICF syndrome, reflecting structural differences.

In contrast, DNMT2 possesses only a C-terminal catalytic domain. Structural analysis identified a loop region (residues 170–234) in the catalytic domain of DNMT2 (Fig. 3D). This structural feature may contribute to the unique substrate recognition mechanism of DNMT2, potentially enabling its specialized function in tRNA methylation (Fig. 3D), corresponding to amino acid residues 170–234. To provide a more comprehensive reference within the above context, we further present structural comparisons of the corresponding regions among selected members of both the DNMT and TET families (Fig. S3). This loop is hypothesized to be potentially involved in tRNA binding for methylation, suggesting a distinct substrate-binding mechanism.

Distinct features of structural analysis of CXXC domains in the TET family

Both TET1 and TET3 contain the CXXC domain, which can bind to non-methylated CpG dinucleotides [34, 35]. The DNA-binding ability of the CXXC domain is closely related to its structural rigidity and electrostatic surface properties. In contrast, TET2 lacks an N-terminal CXXC DNA binding domain, but its function can indirectly regulate DNA binding specificity through the IDAX (also known as CXXC4) protein. This unique structural feature confers flexibility to TET2 in regulating chromatin states, enabling it to dynamically control gene expression by oxidizing 5mC [36]. IDAX expression leads to caspase activation and TET2 protein degradation, a process dependent on DNA binding by the IDAX CXXC domain [37].

Additionally, the low-complexity insertion regions among TET1, TET2, and TET3 consist of unconserved amino acid sequences that predominantly form loop structures. Structural comparison indicates variability in the dimensions of these regions (Fig. 3D), which significantly affects the conformation of the protein. Located within the core catalytic cavity, these insertion regions may contain multiple modification sites that could influence catalytic activity and potentially regulate the conformation of TET proteins. Prior studies have highlighted the crucial role of low-complexity insertion regions in the functional integrity of TET proteins, particularly in mechanisms that reduce the risk of overoxidation [38,39,40].

Fig. 3figure 3

Structural insights into DNMT and TET proteins and their interactions with DNA. (A) Cartoon depictions of DNMT and TET proteins, where helices are distinctly colored in cyan, sheets in orange, and loops in light gray. Accompanying pie charts precisely illustrate the proportion of each secondary structure type within DNMT and TET proteins, providing a clear visual summary of their structural composition. (B) Surface models of DNMT1(PDB:3PTA) and DNMT3A (PDB:5YX2) proteins bound to DNA with a 180° rotation view of TET2 protein-DNA complex (PDB:4NM6), where helices are distinctly colored in bright orange, strands in pale cyan, and loops in white. (C) A comparative analysis of the CXXC structures of DNMT1 (PDB:3PTA), TET1 (PDB:6ASD), and TET3 (PDB:4Z3C) is presented. The B-factors of each CXXC were analyzed to observe the differences between the CXXCs from the perspective of protein flexibility. In the B-factor putty representation, the color gradient from blue-green-yellow-red indicates increasing flexibility, with blue representing the most rigid regions and red indicating the most flexible. (D) Key structures of DNMT2, TET1/2/3 are shown. Orange: regions 170–234 of DNMT2. then Self Purple: loop region of TET1/2/3. The protein structures used above are predicted by AlphaFold

DNMT and TET family proteins exhibit distinct tissue-specific expression patterns

The tissue-specific expression of DNMT and TET family members is regulated by a combination of transcriptional, epigenetic, and post-transcriptional mechanisms [2, 41]. We mapped their tissue-specific expression profiles (Fig. 4). As shown in Fig. 4, DNMT1 is highly expressed in most tissues, especially in the liver and kidneys. DNMT1-mediated epigenetic regulation is considered essential for postnatal liver growth and regeneration and has also been proposed as a therapeutic target for alleviating diabetic nephropathy [42].

DNMT3A and DNMT3B are predominantly active in germ cells and early embryonic development. Their expression underscores an essential role in establishing DNA methylation patterns. Aberrant expression of these proteins has been linked to various cancers, such as colorectal cancer and acute myeloid leukemia. Similarly, DNMT3L is highly expressed in germ cells and, while lacking methyltransferase activity, enhances the function of DNMT3A and DNMT3B through interactions.

Conversely, TET1, TET2, and TET3 are key players in DNA demethylation. TET2 exhibits heightened expression in the hematopoietic system, with its functional impairment has been linked to the development of leukemia. TET3 is notably expressed in the nervous system, indicating its potential role in neurodevelopment and function. These tissue-specific expression patterns underscore the functional specialization of each protein within the DNMT and TET families.

It is noteworthy that there is a close relationship between protein structure and tissue-specific expression. To illustrate this point, consider the examples of TET2 and DNMT3A. TET2 is distinguished by the absence of the CXXC domain, a consequence of a chromosomal inversion, which is encoded by the adjacent gene IDAX. TET2 is expressed at high levels in hematopoietic stem cells and their progenitor cells, and its expression gradually decreases during differentiation. Studies have demonstrated that the structural features of TET2, including the catalytic domain, directly influence its function, regulating the self-renewal, lineage-specific differentiation, and terminal maturation of hematopoietic stem cells [43]. In 2019, the ADD domain and the PWWP domain of DNMT3A were found to recognize histone modifications H3K4me0 and H3K36me2/3, respectively [44]. This structure-dependent histone interaction confers DNMT3A with tissue-specific localization capabilities. For example, in certain tissues, specific histone modification states recruit DNMT3A to perform DNA methylation in specific genomic regions, thereby influencing the spatiotemporal specificity of gene expression [45]. Future research endeavors must delve deeper into the intricate interplay between protein structure, signaling pathways, and disease mechanisms.

Fig. 4figure 4

Tissue-specific expression profiles of DNMT and TET proteins. Bar charts presenting the expression levels of DNMT1, DNMT2, DNMT3A, DNMT3B, DNMT3L, TET1, TET2, and TET3 across different tissues. Each bar corresponds to the expression level of a specific protein in a given tissue

The role of DNMT and TET proteins in multiple signaling pathways affects the occurrence of cancer

Cancer signaling pathway analysis (Fig. 5) underscores the involvement of the DNMT and TET protein families in a range of malignancies, such as liver, colorectal, cervical, ovarian, lung, gastric, and breast cancer. The regulation of these proteins is predominantly mediated by the Wnt/β-catenin, PI3K-AKT, AMPK, NF-κB/Hippo, and Notch1/c-Myc signaling pathways.

DNMT1 affects the self-renewal and maintenance of liver cancer stem cells (CSCs) by regulating the methylation status of BEX 1 in liver cancer, making it a potential therapeutic target in hepatocellular carcinoma (HB) and CSC-HCC [46]. In colorectal cancer, the interaction of DNMT1 with β-catenin regulates the Wnt signaling pathway, affecting tumor genesis and progression [47]. Moreover, DNMT1 affects the expression of matrix metalloproteinase 14 (MMP 14) and Hepatic Nuclear Factor 1 Alpha (HNF1A) in cervical cancer by miR-484, and subsequently regulates the malignant characteristics of tumors [48].

DNMT3A In lung cancer, through miR-708-5p inhibition, reducing DNA methylation level and upregulating the expression of tumor suppressor gene CDH1, thus inhibiting Wnt/β-catenin signaling pathway and impairing the stemness properties of non-small cell lung cancer (NSCLC) cells [49]. In ovarian cancer, DNMT3A expression is inhibited by the combination of curcumin and DAC treatment, affecting the Wnt/β-catenin signaling pathway and alleviating ovarian cancer development [50].

TET1 influences the epithelial-mesenchymal transition (EMT) and cancer stem cells properties. It suppresses tumor metastasis and self-renewal [51]. In NPC, TET1 expression is downregulated, restoring the Wnt antagonists DACT 2 and SFRP 2 expression and inhibiting the Wnt/β-catenin signaling pathway by demethylation [52]. TET2 affects the PI3K / AKT / mTOR signaling pathway in breast cancer by regulating miR-660-5p, and promotes tumor progression [53]. TET3 promotes tumor proliferation, migration, and invasion in thyroid cancer by modulating the AMPK pathway [54]. Among these signaling pathways, the Wnt pathway emerges as the primary signaling pathway associated with DNMT and TET in the context of cancer.

Interestingly, DNMT1 and DNMT3A are often overexpressed in tumors, promoting aberrant methylation of tumor-related genes [55, 56]. TET proteins, especially TET2 often are downregulated in solid tumors [57, 58]. Its internal mechanism still needs further in-depth study. Additionally, Protein structural mutations have been demonstrated to induce aberrant regulation of signaling pathways, thereby precipitating uncontrolled cellular behavior and fostering cancer development. For instance, mutations in the catalytic domain of TET2 result in the loss of its catalytic activity, leading to elevated DNA methylation in the hematopoietic system. This, in turn, inhibits the expression of DNA repair genes, such as Breast Cancer 1 (BRCA1), thereby exacerbating genomic instability and ultimately driving the development of hematological tumors [59, 60]. Mutations in the PWWP domain of DNMT3A disrupt its ability to recognize H3K36me2/3 modifications, thereby interfering with DNA methylation patterns. In the context of cancer, such mutations have been observed to reduce the methylation levels of regulatory elements, such as enhancers, thereby activating stem cell-related genes and promoting tumorigenesis [61, 62].

Fig. 5figure 5

Analysis of DNMT and TET protein regulatory pathways. Network diagram depicting the associations between DNMT and TET proteins and various cancers. All the protein structures used above are predicted by AlphaFold. Proteins are visualized by their structural representations, and cancers are illustrated as organ icons

Comments (0)

No login
gif