Approximately 3% of the human genome is composed of simple sequence repeats, or microsatellites, thought to arise from slippage during DNA replication.1 The most frequently occurring microsatellites are TG and CA dinucleotide repeats.1, 2, 3, 4, 5, 6 Although TG and CA repeats are equal in number in DNA, there is a strong bias for TG repeats to be downstream of transcription start sites and AC repeats to be upstream.4 Therefore, TG repeats are often transcribed into poly(UG) or “pUG” RNA sequences. The human transcriptome has ∼20,000 pUGs of 12 or more dinucleotide repeats, which are predominately located in introns.7 (TG)n repeats in DNA genomes are often polymorphic, with n being a variable number, and genome wide association studies (GWAS) have linked the number of repeats to disease.8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 Some of these diseases may arise from pUG sequences in RNA regulatory regions. For example, a variable pUG region within an intron of the CFTR gene has been correlated with splicing defects, leading to atypical cystic fibrosis with male infertility.11, 12, 19 We therefore hypothesized that disease correlations with (TG)n microsatellites may be due to pUG RNA structure.7 For example, in the case of CFTR, pUG structure may influence protein recognition of adjacent splice sites.11, 19
In C. elegans, pUGs can be added to RNAs post-transcriptionally by the ribonucleotidyltransferase enzyme Mut-2/Rde-3 (hereafter, Rde-3). Rde-3 adds poly(UG) or poly(GU) tails (“pUG tails”) to RNA 3′ ends, initiating with either G or U with equal frequency.20, 21 pUG tails with 12 or more Gs convert otherwise inert RNAs into potent vectors for gene silencing.21 pUG tails function by recruiting RNA dependent RNA Polymerase (RdRP) to the ends of RNAs, marking them as templates for synthesis of small interfering RNAs (siRNAs).21 Enzymatic cycling of sense strand pUG tail synthesis and antisense siRNA production amplifies the gene silencing pathway and allows it to persist for generations, a phenomenon known as transgenerational epigenetic inheritance (TEI).21, 22 pUG mediated TEI is important for transposon silencing and maintenance of genome stability in C. elegans.21
We recently discovered the pUG fold, in which pUGs with 12 or more Gs fold into an atypical RNA quadruplex (G4) structure with a left-handed backbone that resembles Z RNA7 (Figure 1A and B). The pUG fold G4 is required for gene silencing.7 We determined the crystal structure of the pUG fold bound to the G4-stabilizing ligand N-methyl-mesoporphyrin IX (NMM), which stacks on the structure7 (Figure 1B). The pUG-NMM crystals were disordered due to 4-way twinning, owing to the near perfect 4-fold symmetry of the structure. The resulting electron density maps were insufficient to precisely locate some features of the RNA, including the positions of the 5′ and 3′ ends and the conformations of bulged uridines.7 Therefore, an improved understanding of the pUG fold structure is needed. Here we report the solution structure of the free pUG fold, determined by nuclear magnetic resonance (NMR) spectroscopy combined with small and wide-angle x-ray scattering (SAXS-WAXS). The low complexity and high symmetry of dinucleotide repeat sequences present challenges for NMR assignment and structure determination, and we successfully determined the structure by combining established strategies and new NMR methods. For reproducibility, the entire dataset, including raw and processed data (>30 GB), along with all software used in this study, have been made accessible within the NMRbox virtual machine resource23 at https://www.nmrbox.org (public/pUGNMR). Overall, the solution structure of the pUG fold in the absence of NMM and the crystal structure bound to NMM are similar, with a few notable exceptions described below. The NMR data also show how the pUG fold can form within longer sequences, providing new insights into the mechanism by which pUG repeats can fold and function in cells.
Comments (0)