G-quadruplexes (or “G4s”) are non-canonical nucleic acid structures formed by guanine-rich DNA or RNA sequences. They are based on the formation of G-tetrads, formed by the arrangement of four guanines connected by eight hydrogen bonds. Potassium or sodium cations, abundantly present in the cellular environment, stabilize these G-tetrads thanks to specific electrostatic interactions with the oxygen of the carbonyl groups of the four guanines. The stacking of 2 or more G-tetrads constitutes the core of the G4 structure which is composed, at each corner, of four guanine pillars or strands. G4s are polymorphic structures that can sample a vast space of conformations (reviewed here1). Indeed, in the case of intramolecular G4s, the four strands can be interconnected by three different types of loops (lateral, diagonal or propeller) of variable sizes and sequences. Consequently, each of the four pillars can be oriented in a parallel or antiparallel orientation to its adjacent strands. Thermodynamics and kinetics studies have shown that G4s are often stable far above the physiological temperature (Tm≫37 °C)2 and can form within a few tens of milliseconds.3 Hence, the cell has certainly evolved mechanisms that use the valuable properties of G4s in vivo. Indeed, an increasing number of publications demonstrate their involvement in key biological processes (reviewed here4). Several in cellulo studies using specific G4 probes (antibodies or ligands) have confirmed the existence of G4s in the genomic DNA and cellular RNA5, 6, 7 and their involvement in telomeres dynamics,8 RNA splicing /translation,9 DNA replication and transcription.10
In the case of transcription, genome wide bioinformatics studies have identified 1.5 million putative G4s (pG4s) in the human genome with up to 66% of the human promoters presenting at least one pG4. Interestingly, important transcription factor binding sites, such as Sp1, MAZ, Krox and ZF5 are positioned near or overlaps the pG4s11 suggesting that G4 formation within promoters plays an important role in transcription regulation (reviewed here12). G4 formation at promoters has been directly related to the level of chromatin compaction, i.e. G4s are enriched in relaxed nucleosome-depleted regions (NDRs) located upstream of transcription start sites (TSSs).11 Furthermore, an increase in the proportion of G4s formed at promoters has been observed when inducing chromatin relaxation by histone deacetylase inhibitors.13 In some cases, once these G4s are formed, they are able to enhance gene transcription by recruiting transcription factors involved in RNA pol II machinery.14 This is supported by the high G4-binding affinity observed in vitro for Sp1,15 CNBP,16 and LARK17 transcription factors. G4s can even further stimulate transcription by acting as a hub that enables the simultaneous recruitment of a variety of transcription factors.18
G4s have also been involved in the replication cycle of a number of viruses such as HIV-1, HCV,19 coronaviruses20 and many others reviewed here.21 HIV-1 is an RNA retrovirus that infects CD4 cells and induces a deficiency of the immune system causing AIDS disease. HIV-1 viral particle carries two identical RNA genomes. Shortly after infection, the two viral RNAs are reverse-transcribed into double-stranded DNAs. The DNA proviral genome is then integrated in the genetic material of the infected cell. After integration, the provirus uses the transcription machinery of the host cell to transcribe its genetic content. We have recently identified ten evolutionary conserved G4 forming sequences in the HIV-1 genome.22 Most of these G4s contain crucial regulatory elements such as the PPT and cPPT sequences as well as the U3 region. Interestingly, proviral transcription is regulated by a G-rich sequence located on the U3 region of the viral promoter. This sequence contains the three Sp1 transcription factor binding sites. It is located 50 nucleotides upstream of the transcription-starting site, next to the TATA box, on the U3 region of the 5′ LTR (Figure 1A, B). We previously analyzed the central part of this G-rich region spanning the three Sp1 binding sites, from the 5′ extremity of Sp1-3 to the 3′ extremity of Sp1-1 (HIVpro1) (Figure 1B). It formed a stable two G-tetrad antiparallel G4 with an additional Watson − Crick CG base pair.23 The structure of a second G4 spanning the two first Sp1 binding sites has also been solved (LTR-IV) (Figure 1B). It formed a parallel-stranded G-quadruplex containing a single-nucleotide thymine bulge.24 A more recent study showed the formation of a G4−duplex hybrid structure (LTR-III) (Figure 1B) that spans the 3 Sp1 binding sites.25
Along this line, we characterize here the structure of a fourth possible G4 conformation adopted by the G-rich fragment of HIV-1 promoter. This 22 nucleotides segment, referred to as HIVpro2, spans the second and third Sp1 transcription factor binding sites (5′-AGGGAGGTGTGGCCTGGGCGGG-3′). It forms a three G-tetrads structure with a hybrid type G4 core that is interrupted by a single nucleotide bulge. An additional reverse Hoogsteen type AT base pair stacks on top of the upper tetrad. Such G4 structure is potentially able to form during transcription; therefore, a conformational interplay between these G4s might intervene in the regulation of promoter activity.
Comments (0)