Single-cell multiomics of the human retina reveals hierarchical transcription factor collaboration in mediating cell type-specific effects of genetic variants on gene regulation

Single-nuclei multiomics profiling of 20 healthy human donor retinae

To profile gene expression and chromatin accessibility in a cell type-specific context, we performed snRNA-seq and snATAC-seq on the healthy retinae from 20 human donors (Fig. 1a, Additional file 2: Table S1). For snRNA-seq, upon quality control (QC), a total of 192,792 nuclei were clustered into 10 major retinal cell classes, including rod photoreceptors (Rod), cone photoreceptors (Cone), bipolar cells (BC), amacrine cells (AC), horizontal cells (HC), müller glia cells (MG), retinal ganglion cells (RGC), astrocytes (Astro), endothelial cells, and microglia cells (“Methods”, Fig. 1b, Additional file 1: Fig. S1a). In parallel, snATAC-seq was performed for the same set of donor retinae. After QC, a total of 245,940 nuclei were clustered into 9 major retinal cell classes (Fig. 1b, Additional file 1: Fig. S1b). Consistent with the cell type annotation, canonical cell type marker genes show specific expression and gene activity in the corresponding cell clusters from snRNA-seq and snATAC-seq respectively [27] (Fig. 1c). Furthermore, the distributions of cell type proportions profiled by the two methods are highly concordant across the samples, ranging from 2.5% RGC to 55.2% Rod (Fig. 1d, Additional file 2: Table S2).

A total of 430,567 open chromatin regions (OCRs) were identified from the snATAC-seq data, ranging from 48,764 to 199,666 per cell type (“Methods”, Additional file 2: Table S3). To assess the quality of these OCRs, we compared them with OCRs detected by bulk ATAC-seq [28]. The snATAC-seq OCRs exhibit high sensitivity, capturing most OCRs identified by bulk ATAC-seq (Additional file 1: Fig. S1c,d,e). Specifically, 74.9 and 84.2% of OCRs identified by bulk ATAC-seq of retina and macula tissues were detected in the snATAC-seq dataset respectively [28]. Additionally, 96.2% of putative active enhancers identified in previously published bulk seq data were found in the snATAC-seq OCR list [28] (Additional file 1: Fig. S1c). Consistent with Rod being the most abundant cell type in the retina, OCRs identified in Rod show the strongest correlation with those in the bulk retina data (a Pearson correlation of 0.69), while lower correlations were observed for the cell types that represent a small proportion of the total retinal cell population (for example, a Pearson correlation of 0.41 for RGC, Additional file 1: Fig. S1d). Conversely, 74.0% of the OCRs were only detected by snATAC-seq, indicating a large portion of OCRs are specific to a subset of cell types. Indeed, 51.5% of the snATAC-seq OCRs are unique to one cell type (Additional file 2: Table S3), which were largely missed by bulk ATAC-seq with a low detection rate of 14.3% (Additional file 1: Fig. S1e). To further evaluate the snATAC-seq OCRs, we investigated the enrichment of TF binding motifs within the OCRs for each cell type (Fig. 1e). Consistently, many identified TFs are previously shown to play cell type-specific role in the retina, such as OTX2, CRX, MEF2D in photoreceptor cells, ONECUT2 in HC, NFIA, NFIB, NFIX, and LHX2 in MG, supporting our findings [29,30,31,32,33].

Based on our multiomics dataset, we identified putative cis-regulatory elements (CREs) and their target genes by correlating OCR accessibility with nearby (± 250 kb) promoter accessibility or gene expression; we refer to these OCRs as linked CREs (LCREs) (Fig. 1f). Approximately 16.6% (71,274) of OCRs are linked to 13,405 target genes, averaging 5.9 LCREs per gene per cell type. As expected, LCREs are enriched for the CREs identified in previous studies, with 74.2 and 87.0% of the LCREs found in the ENCODE cCRE registry [6] and recent cCREs atlas [16] respectively (1.44- and 1.26-fold enrichment compared to all the OCRs, two-sided binomial test, \(<2.2\times ^\)). Furthermore, LCREs are highly enriched with active enhancers. For example, 81.8% of LCREs in Rod carry the epigenetic modifications of active enhancers, concurrent H3K4me2 and H3K27ac, representing a 2.1-fold enrichment compared to all the OCRs (two-sided binomial test, \(p<2.2\times ^\), Fig. 1g). For each cell type, on average 5.9% of LCREs are in cell type-specific OCRs, 62.1% of LCREs are from OCRs shared by multiple cell types, and 32.0% of LCREs are located in constant OCRs (Fig. 1g). LCREs tend to be in more dynamic OCRs with overall 57.3% in differential accessible regions (DARs), representing a 2.2-fold enrichment compared to all the OCRs (\(p<2.2\times ^\), Additional file 1: Fig. S1f).

Significant proportion of sc-eQTLs are cell type-specific in retina

To profile genetic variation in the donors, WGS was conducted on each donor, yielding a total of 9.8 million genetic variants after QC (Additional file 1: Fig. S2a). To identify genetic variants that have an impact on gene expression, we mapped sc-eQTLs for each major retinal cell type. Due to the limited number of individuals available for our study, we focused on variants with allele frequency ≥ 0.1 that are located within OCRs surrounding the genes (± 250 kb of gene transcription start site, TSS), totaling 421,004 variants, averaging 59.9 variants per gene and 2.8 variants per OCR per cell type.

A total of 14,377 sc-eQTLs that reach gene-level significance with false discovery rate (FDR) < 10% were identified. These sc-eQTLs were further grouped based on linkage disequilibrium (LD) (\(^>0.5\)) and their eGenes, resulting in 5688 independent sc-eQTL sets associated with 4069 sc-eGenes, ranging from 704 to 1175 sc-eQTL sets per cell type (Fig. 2a, b, Additional file 2: Table S4). The majority (86.1–91.8%) of sc-eGenes has only one sc-eQTL set per cell type (Additional file 1: Fig. S2b). Interestingly, most of sc-eQTLs are cell type-specific, with 87.0–92.3% identified in only one cell type (Fig. 2a). Furthermore, the remaining sc-eQTLs that are observed in multiple cell types are often shared among closely related cell types, such as between rod and cone photoreceptors (Additional file 1: Fig. S2c). Consistently, the effect of sc-eQTLs across cell types is correlated with the similarity between cell types (Fig. 2c), for example, a stronger correlation is observed between rod and cone photoreceptors (Pearson correlation \(r=0.6\)). These results suggest that genetic variants have a more consistent impact on gene regulation among closely related cell types that share similar transcription programs. Interestingly, for sc-eQTLs located in non-promoter OCRs, those shared by multiple cell types tend to have greater effect sizes compared to those unique to one cell type (e.g., Rod, two-sided Wilcoxon rank sum test, \(p=1.88\times ^\), Fig. 2d, Additional file 2: Table S5). Consistently, for sc-eQTLs located in non-promoter OCRs, those shared by multiple cell types are located closer to the TSS of their eGenes than those unique to one cell type (e.g., Rod, two-sided Wilcoxon rank sum test, \(p=9.75\times ^\), Fig. 2e, Additional file 2: Table S6).

Fig. 2figure 2

Identification of sc-eQTLs in retinal cell types. a Bar plot showing the number of independent index sc-eQTLs reaching gene-level FDR < 0.1 per cell type, colored by the number of cell types where a sc-eQTL is significant. b Bar plot showing the number of sc-eGenes reaching gene-level FDR < 0.1 per cell type, colored by the number of cell types where a sc-eGene is significant. c Heatmap showing the Pearson correlation of sc-eQTL effect size across retinal cell types. d Violin plot showing the distribution of absolute effect size of the sc-eQTLs located in non-promotor OCRs identified in Rod, colored by whether sc-eQTLs were identified in one (n = 1969) or more cell types (n = 539). To compare the effect size between the two types of sc-eQTLs, two-sided Wilcoxon rank sum test was performed, \(p=1.88\times ^\). e Violin plot showing the distribution of the distance between sc-eQTL and eGene TSS for the sc-eQTLs located in non-promotor OCRs identified in Rod, colored by whether sc-eQTLs were identified in one (n = 1969) or more cell types (n = 539). To compare the distance distribution between the two types of sc-eQTLs, two-sided Wilcoxon rank sum test was performed, \(p=9.75\times ^\). f Bar plot showing proportion of sc-eQTLs overlapping with bulk eQTLs, colored by the tissue types where the bulk eQTL is significant. sc: the identified sc-eQTLs. Other: other tissue. g Heatmap showing effect size of sc-eQTLs and the overlapped bulk eQTLs. Each row corresponds to an eQTL. Each column corresponds to a tissue or cell type. h Scatter plot showing the effect size of the overlapped sc-ASEs (X axis) and sc-eQTLs (Y axis) in the corresponding cell type. The Pearson correlation coefficient and p-value are indicated in the plot

Validation of sc-eQTLs with bulk eQTLs and sc-ASE

To evaluate the quality of sc-eQTLs, we compared them with the bulk eQTLs previously identified in retina and other tissues through bulk RNA-seq from the GTEx project [34]. sc-eQTLs are enriched for bulk eQTLs. On average, 35.6% of sc-eQTLs are overlapped with the bulk retina eQTLs (4.4-fold enrichment compared to background variants, two-sided binomial test \(p<1.2\times ^\)) and 56.0% overlapped with the bulk eQTLs from all the 49 tissues (2.3-fold enrichment compared to background, two-sided binomial test \(p<2.1\times ^\), Fig. 2f). The proportion of overlap varies across cell types (Fig. 2f). As expected, the highest overlap (63.9%) is observed for the most abundant cell type, Rod, while the lowest overlap is observed for HC (49.0%) (Fig. 2f). Importantly, the effect directions of eQTLs are largely concordant across different cell types and tissues (Fig. 2g).

We further validated these sc-eQTLs by comparing them with sc-ASEs. sc-eQTLs were found to be enriched for sc-ASEs. Among the sc-eQTLs tested for sc-ASEs, sc-ASEs were detected in 18.8–34.0% of the variants (with the highest overlap in Rod, 34.0%), on average 2.5-fold enrichment compared to the background variants (two-sided binomial test \(p<1.2\times ^\), Additional file 1: Fig. S2d). The effect size and direction are positively correlated between the overlapped sc-eQTLs and sc-ASEs (Pearson correlation, \(r\) in 0.68–0.77, \(p<2.2\times ^\)), with most (82.5–94.2%) of the overlapped variants having the same effect direction (Fig. 2h, Additional file 1: Fig. S2d). Altogether, these results support that the majority of identified sc-eQTLs are likely to be true positives.

Cell type-specific sc-eQTLs often reside in OCRs shared by multiple cell types

An interesting observation is that most (87.0–92.3%) of sc-eQTLs are unique to one cell type, while the associated sc-eGenes (94.6–98.9%) are almost always expressed in multiple cell types (Figs. 2a and 3a). Specifically, only a small proportion (1.8–6.0%) of sc-eQTLs and their associated sc-eGene expression exhibit the same pattern of cell type specificity. In over 90% of the cases, while the sc-eQTL is observed in one or a subset of cell types, the sc-eGenes are expressed in more cell types. Interestingly, different sc-eQTLs are often observed for the same sc-eGene in different cell types (36.4% of total sc-eQTLs) (Fig. 3b), and these sc-eQTLs tend to be located in different OCRs (34.0% of total sc-eQTLs, Additional file 1: Fig. S2e). This cannot be solely attributed to cell type-specific accessibility of the OCRs, as most of cell type-specific sc-eQTL variants reside in the OCRs accessible across multiple cell types. It is not primarily due to differential accessibility of the OCRs either, as only a small proportion (8.8–19.4%) of sc-eQTLs reside in the DARs specific to the corresponding cell types. In fact, merely a small fraction (11.4%) of sc-eQTLs reside in OCRs with matching cell type specificity as those of the sc-eQTLs (Fig. 3c). For example, the variant rs10793810 is identified as a sc-eQTL specific to MG, influencing the expression of SLC27A6. This variant was predicted to enhance the binding of FOXP2 (highly expressed in MG), to a MG-specific LCRE of SLC27A6 (Fig. 3d). In contrast, the majority (89.1%) of sc-eQTLs are located within OCRs shared among multiple cell types (Fig. 3c), suggesting that the modulation of gene expression by genetic variants is driven by cellular activity of trans-factors and accessibility of cis-elements. For example, the variant rs62308155 is identified as a sc-eQTL specific to Rod, affecting the expression of REST. This variant likely disrupts the binding of NR3C1, which is highly expressed in Rod but minimally in Cone, to a LCRE accessible in both Rod and Cone (Fig. 3e). Furthermore, genome-widely, TFs with motifs perturbed by genetic variants tend to have higher expression levels in the cell types where the variants have sc-eQTL effects compared to the cell type where the variants do not have an effect, even though the OCRs containing the sc-eQTLs are accessible in both cell types (e.g., Rod, one-sided Wilcoxon rank sum test, \(p<0.057\), Fig. 3f, Additional file 2: Table S7).

Fig. 3figure 3

Cell type-specific sc-eQTLs are often within OCRs shared by multiple cell types. a Bar plot showing proportion of sc-eQTLs associated with sc-eQTLs significant in one or more cell types, colored by the number of cell types where the sc-eGene is expressed. b Bar plot showing proportion of sc-eQTLs associated with sc-eGenes significant in one or more cell types, colored by the number of cell types where the sc-eQTL is significant. c Bar plot showing proportion of sc-eQTLs associated with sc-eQTLs significant in one or more cell types, colored by the number of cell types where the OCR containing the sc-eQTL is accessible. d Box plot showing the expression of SLC27A6 in 20 individuals with different genotype of rs10793810 in MG. FOXP2 motif position weight matrix aligned to reference and alternative allele at rs10793810. Violin plot showing the expression level of SLC27A6 and FOXP2 across retinal cell types (n = 50,000 cells). Genome track of SLC27A6 locus showing cell type-specific chromatin accessibility and positioning of rs10793810 within an OCR that is a predicted LCRE of SLC27A6. e Box plot showing the expression of REST in 20 individuals with different genotype of rs62308155 in Rod and Cone. NR3C1 motif position weight matrix aligned to reference and alternative allele at rs62308155. Violin plot showing the expression level of REST and NR3C1 across retinal cell types (n = 50,000 cells). Genome track of REST locus showing cell type-specific chromatin accessibility and positioning of rs62308155 within an OCR that is a predicted LCRE of REST. f Violin plot showing distribution of gene expression levels of TFs in Rod and another cell type. These TFs have binding motifs perturbed by a sc-eQTL that has effect specifically in Rod but not in the other cell type. One-sided Wilcoxon rank sum test was performed to detect difference of gene expression of these TFs between Rod and the other cell type. The p-value and sample size n are indicated in the figure

Significant proportion of sc-caQTLs are cell type-specific in retina

In parallel with sc-eQTL analysis, we also conducted sc-caQTL analysis to identify genetic variants that affect chromatin accessibility, by examining the association between each OCR and common variants within it for each major retinal cell type. A total of 174,419 OCRs (ranging from 54,716 to 95,020 OCRs per cell type) and the same set of variants tested for sc-eQTLs were analyzed (“Methods”). Upon genome-wide multiple testing corrections, 23,287 sc-caQTLs were identified (FDR < 10%). These sc-caQTLs were grouped into 12,482 independent sc-caQTL sets mapped in 10,298 OCRs based on LD (\(^>0.5\)), ranging from 391 to 4789 sc-caQTLs sets per cell type (Fig. 4a, b and Additional file 2: Table S8). The majority (88.0%) of caQTL-associated peaks (sc-caPeaks), which are OCRs containing caQTLs in this study, are associated with only one sc-caQTL set (Additional file 1: Fig. S3a). Similar to sc-eQTLs, the majority of sc-caQTLs are cell type-specific with 62.3–85.7% being unique to one cell type. The effect sizes of sc-caQTLs are correlated across cell types, with stronger correlation observed between more closely related cell types (Fig. 4c and Additional file 1: Fig. S3b). Compared to the correlation of sc-eQTL effect sizes across cell types, those of sc-caQTLs appear to be smaller. This suggests that sc-caQTL effects show higher cell type specificity than sc-eQTLs, possibly due to the greater variation in chromatin accessibility across cell types compared to gene expression. Similar to sc-eQTLs, for sc-caQTLs located in non-promoter OCRs, those shared among multiple cell types displayed significantly greater effect than those unique to one cell type (e.g., Rod, two-sided Wilcoxon rank sum test, \(p=1.09\times ^\), Fig. 4d, Additional file 2: Table S9).

Fig. 4figure 4

Identification of sc-caQTL in retinal cell types. a Bar plot showing the number of independent index sc-caQTLs reaching genome-level FDR < 0.1 per cell type, colored by the number of cell types where a sc-caQTL is significant. b Bar plot showing the number of sc-caPeaks reaching genome-level FDR < 0.1 per cell type, colored by the number of cell types where a sc-caPeak is significant. c Heatmap showing the Pearson correlation of sc-caQTL effect size across retinal cell types. d Violin plot showing the distribution of absolute effect size of the sc-caQTLs located in non-promotor OCRs identified in rod cells, colored by whether sc-caQTLs were identified in one (n = 6102) or more cell types (n = 2085). To compare the effect size of the two types of sc-caQTLs, two-sided Wilcoxon rank sum test was performed, \(p=1.09\times ^\). e Scatter plot showing the effect size of sc-ASCAs (X axis) and the population effect size of the overlapped sc-caQTLs (Y axis). The Pearson correlation coefficient and p-value are indicated in the figure. f Bar plot showing proportion of sc-caQTLs associated with sc-caQTLs that are significant in one or more cell types, colored by the number of cell types where the caPeak is accessible

Validation of sc-caQTLs with sc-ASCA

To assess the quality of the identified sc-caQTLs, we compared them with sc-ASCAs. sc-ASCAs are detected in 8.7–41.8% of the sc-caQTLs tested for sc-ASCAs (with the highest overlapping rate in Rod, 41.8%), on average 15.9-fold enrichment compared to the background variants (two-sided binomial test, \(p<6.8\times ^\), Additional file 1: Fig. S3c). The effect size and direction of sc-ASCAs are positively correlated with those of the overlapped sc-caQTLs (Pearson correlation, \(r\) in 0.75–0.90, \(p<5.4\times ^\)), with the majority (82.4–100%) of the overlapped variants showing the same effect direction (Fig. 4e, Additional file 1: Fig. S3c). These findings support that identified sc-caQTLs are indeed enriched of variants associated with change in chromatin accessibility. Conversely, 33.3–54.5% of the identified sc-ASCAs overlap with sc-caQTLs, depending on cell type. Interestingly, OCRs containing sc-ASCA-only variants (not overlapping with sc-caQTL) were significantly wider than those containing variants that are both sc-ASCA and sc-caQTL (Additional file 1: Fig. S3d, e.g., one-sided Wilcoxon rank sum test, \(p=7.7\times ^\) in Rod). This observation suggests that variants in wider OCRs tend to have a local effect, while variants in narrow OCRs are more likely to affect accessibility of the entire OCRs.

Cell type-specific sc-caQTLs can reside in OCRs accessible in multiple cell types

Interestingly, similar to sc-eQTLs, most (62.3–85.7%) of sc-caQTLs are unique to one cell type, while the majority (74.8%) of sc-caPeaks are accessible in multiple cell types (Fig. 4a, f). Specifically, only 24.4% of sc-caQTLs and their caPeak accessibility share the same pattern of cell type specificity. The remaining 75.6% of sc-caQTLs were found in one or a subset of cell types, while their caPeaks are accessible in more cell types (Fig. 4f). The cell type-specific sc-caQTLs cannot be solely attributed to the differential accessibility of OCRs either, since a small proportion (14.9–34.3%) of sc-caQTLs were observed in the DARs of the corresponding cell types. Interestingly, for sc-caPeaks shared across multiple cell types, a higher proportion is associated with common sc-caQTL variants, compared to those associated with cell type-specific sc-caQTL variants (23.4% vs 10.4% of sc-caPeak per cell type, one-sided Fisher’s exact test, \(p=2.48\times ^\)). As an example where cell type specificity of sc-caQTLs matches with that of their residing OCRs, the variant rs12447029 exhibits a sc-caQTL effect specifically in MG through strengthening the binding of NFE2L2 (which is highly expressed in multiple cell types) to its residing OCR, which is specifically accessible in MG (Fig. 5a). Consistently, the corresponding OCR is a LCRE of GRIN2A, and rs12447029 is a sc-eQTL for GRIN2A in MG (Additional file 1: Fig. S3e). On the other hand, the cell type specificity of many sc-caQTLs cannot be solely attributed to the cell type specificity of the residing OCRs. A large proportion (68.3%) of sc-caQTLs are unique to one cell type but reside in the OCRs accessible in multiple cell types, indicating the modulation of chromatin accessibility by genetic variants is often cell type-dependent and likely through perturbing the binding of cell type-specific trans-factors (Fig. 4f). For example, the variant rs6859300 affects the chromatin accessibility of its residing OCR specifically in Rod, despite this OCR being accessible in multiple cell types (i.e., Rod, Cone, and BC), potentially through enhancing the binding of EPAS1, which is highly expressed in Rod but lowly expressed in Cone and BC (Fig. 5b). Consistently, the corresponding OCR is a LCRE of WWC1, and rs6859300 is a sc-eQTL of WWC1 in Rod (Additional file 1: Fig. S3f). Furthermore, TFs with motifs perturbed by genetic variants tend to have higher expression levels in the cell types where the variants exhibit sc-caQTL effects, compared to the cell type where the variants do not have an effect, even though the OCRs containing the sc-caQTLs are accessible in both cell types. This result supports the role of trans-factors in driving cell type-specific sc-caQTL effects genome-widely (e.g., Rod, one-sided Wilcoxon rank sum test, \(p<0.054\), Fig. 5c, Additional file 2: Table S10).

Fig. 5figure 5

Cell type-specific effect of sc-caQTLs can be driven by trans-factors. a Box plot showing the chromatin accessibility of the OCR harboring rs12447029 in 20 individuals with different genotype of rs12447029 in MG. NFE2L2 motif position weight matrix aligned to reference and alternative allele at rs12447029. Violin plot showing the expression level of NFE2L2 across retinal cell types. Genome track of GRIN2A locus showing cell type-specific chromatin accessibility and positioning of rs12447029 within an OCR which is a predicted LCRE of GRIN2A. b Box plot showing the chromatin accessibility of the OCR harboring rs6859300 in 20 individuals with different genotype of rs6859300 in Rod, Cone, and BC. EPAS1 motif position weight matrix aligned to reference and alternative allele at rs6859300. Violin plot showing the expression level of EPAS1 across retinal cell types. Genome track of WWC1 locus showing cell type-specific chromatin accessibility and positioning of rs6859300 within an OCR which is a predicted LCRE of WWC1. c Violin plot showing distribution of gene expression levels of TFs in Rod and another cell type. These TFs have binding motifs perturbed by a sc-caQTL that has effect specifically in Rod but not in the other cell type. One-sided Wilcoxon rank sum test was performed to detect difference of gene expression levels of these TFs between Rod and the other cell type. The p-value and sample size n are indicated in the figure

In addition, to address potential bias in sc-caQTL detection introduced by cell type abundance difference, we applied a random down-sampling approach to repeat sc-caQTL analysis. The results obtained through this down-sampling analysis consistently aligned with our above findings based on the original dataset, confirming the robustness of the observed sc-caQTL patterns (see “Assessing the potential bias in caQTL detection introduced by cell type abundance differences” section in Additional file 1: Supplementary Note, Additional file 1: Fig. S4).

Interaction among OCRs

Previous studies suggested that multiple regulatory elements can be regulated by a single genetic variant [11]. One possible mechanism is that the accessibility of a “master” element influences the accessibility of neighboring “dependent” elements [11]. To investigate this phenomenon in our dataset, we identified 2511 dependent regions associated with 1942 master regions (“Methods”, Additional file 2: Table S11). Among these, 360 master regions that are LCREs are associated with 427 dependent regions that are also LCREs of the same genes. Notably, we observed a significant enrichment of sc-caQTLs associated with the dependent OCRs (e.g., 1.8-fold enrichment compared to the background variants in Rod, two-sided binomial test, \(p=1.48\times ^\)) as well as dependent LCREs (e.g., 1.7-fold enrichment compared to background in Rod, two-sided binomial test, \(p=4.49\times ^\)), suggesting the association between sc-caQTL/master elements and dependent elements is not random (Fig. 6a). Furthermore, a significant proportion (59.7%) of the master OCRs are co-accessible with at least one of the corresponding dependent OCRs (correlation ≥ 0.2, FDR < 0.05), showing a 2.8-fold enrichment compared to the background (the co-accessibility of two random OCRs within a 250-kb region, two-sided binomial test, \(p<2.2\times ^\)), further supporting the interactions between master and dependent OCRs. The effect size of sc-caQTLs on the master regions and dependent regions are positively correlated (an average correlation coefficient of 0.60, \(p=1.5\times ^\)), and the majority (65.0–82.0%) of sc-caQTLs having the same effect direction on the master and dependent regions (Fig. 6b). Furthermore, we observed a slightly higher enrichment in DARs in the master regions compared to the dependent regions (one-sided Fisher’s exact test, caPeak: \(p=7.1\times ^\); caPeaks associated with LCREs: \(p=0.082\)), as well as an enrichment in active enhancer epigenetic modifications (the concurrent H3K27ac and H3K4me2, one-sided Fisher’s exact test, caPeak: \(p=1.48\times ^\); caPeaks associated with LCREs: \(p=0.10\)) (Fig. 6c).

Fig. 6figure 6

The sc-caQTLs can have effects on multiple genomic regions. a Bar plot showing the proportion of sc-caQTLs or the background variants affecting dependent OCRs and the proportion of sc-caQTLs or the background variants affecting dependent LCREs. The numbers of variants are indicated in the plot. Two-sided binomial test was performed to compare sc-caQTL with the background variants. “*”: 0.05 > p \(\ge\) 0.01, “**”: 0.01 > p \(\ge\) 0.001, “***”: p < 0.001. b Scatter plot showing the effect size of sc-caQTLs on master regions (X axis) and on dependent regions (Y axis). c Bar plot showing the proportion of the master or dependent OCRs that are DARs or have concurrent H3K27ac and H3K4me2 modifications in Rod. The numbers of caPeaks with different features are indicated in the plot. Two-sided binomial test was performed to compare the proportion of master_OCR or dependent_OCR with the proportion of tested_OCR. d Box plot showing the chromatin accessibility of one master OCR and three dependent OCRs in 20 individuals with different genotype of rs7596259 in Rod. Box plot showing the gene expression level of ITGA6 in 20 individuals with different genotype of rs7596259. Genome track of ITGA6 locus showing cell type-specific chromatin accessibility of the master OCR (red) and three dependent OCRs (yellow). The master OCR and one of the dependent OCRs are the predicted LCREs of ITGA6. e Box plot showing the chromatin accessibility of one master OCR and one dependent OCR in 20 individuals with different genotype of rs1493699 in Rod. Box plot showing the gene expression level of PEAK1 in 20 individuals with different genotype of rs1493699. Genome track of PEAK1 locus showing cell type-specific chromatin accessibility of the master OCR (red) and one dependent OCR (yellow). The master OCR and the dependent OCR are the predicted LCREs of PEAK1

While the majority (66.5–87.7%) of master regions are associated with one dependent region, some master regions have multiple dependent regions. For example, the sc-caQTL variant rs7596259 increases the accessibility of its residing master region and is also associated with the increased accessibility of three other dependent regions in Rod (Fig. 6d). This sc-caQTL variant is also a sc-eQTL, leading to increased gene expression of ITGA6 in Rod, suggesting some of the affected regions may play a role in regulating gene expression (Fig. 6d). Indeed, the master region (chr2:173305356–173307494) and one of the dependent regions (chr2:173284642–173285585) are predicted LCREs of ITGA6 (Fig. 6d). Interestingly, sc-caQTLs that affect multiple regions in the same effect direction are more likely to overlap with sc-eQTLs in the corresponding cell type compared to sc-caQTLs that affect multiple regions in different effect directions (in Rod 15.9% vs. 4.2%, two-sided binomial test \(p=2.25\times ^\)). This observation suggests that compensatory effects between cis-elements associated with opposite effect directions may neutralize their impact on gene expression. For example, the sc-caQTL variant rs1493699 reduces the accessibility of its residing master region (chr15:77664198–77665218) and is associated with increased accessibility of a dependent region (chr15:77873253–77874263) in MG (Fig. 6e). Although both elements are LCREs of PEAK1, this sc-caQTL is not a sc-eQTL of PEAK1, suggesting that the two LCREs might compensate for each other, resulting in no overall change in gene expression (Fig. 6e). In addition, the affected OCRs of the majority (71.4%) of sc-caQTLs that impact multiple OCRs and also overlap with cs-eQTLs exhibit co-accessibility, presenting a 1.23-fold enrichment compared to those of sc-caQTLs that impact multiple peaks but do not overlap with cs-eQTLs (two-sided binomial test, \(p=1.96 \times ^\)). This result provides further evidence supporting the idea that multiple OCRs may interact jointly to determine a cs-eQTL.

Massively parallel reporter assays of OCRs harboring sc-eQTLs

To validate the results of our association studies, we conducted Massively Parallel Reporter Assays (MPRAs) to assess the regulatory activities of 931 OCRs housing the index sc-eQTLs identified in human rod cells [35]. These sequences were tested in explanted mouse retinas during postnatal day 0 to 8, which primarily consist of rod cells (Fig. 7a). The MPRA library was designed with synthesized 224 bp oligonucleotides (oligos) centered on the peak summit of these OCRs (based on the human hg19 reference genome), along with positive and negative control sequences (“Methods”). Following the MPRAs, we calculated an activity score for each library sequence and normalized it based on the activity of the basal Crx promoter (“Methods”). As a result, we identified 258 enhancers and 66 silencers that showed at least a twofold higher or lower activity than the basal activity (i.e., activity of Crx basal promoter, q-value < 0.05), as well as 607 inactive sequences that exhibited activity within a twofold change of the basal activity or not significant different from the basal activity (q-value ≥ 0.05) (Fig. 7b, c, Additional file 2: Table S12). A high fraction of OCR sequences identified as inactive could be attributed to several factors: (1) limited sensitivity of the experimental system; (2) differential regulatory activity of CREs between human and mouse; and (3) false positives in identified sc-eQTLs due to a small cohort.

Fig. 7figure 7

Massive parallel reporter assays (MPRAs) of cis-regulatory elements. a A schematic plot showing the workflow of MPRAs. b Density plot showing the distribution of \(_FC\) values of the 1251 tested sequences, colored by the sequence type. c Volcano plot showing the \(_FC\) value (X axis) and the \(-_FDR\) value (Y axis) of each tested sequence (n = 1251). Each dot corresponds to a tested sequence, colored by the activity of the sequence. d Heatmap showing the scaled proportion of the regulatory sequences (i.e., Enhancer, Silencer, and Inactive) overlapped with different transcription factor bindings (i.e., CTCF, CRX, MEF2D, and OTX2) and histone modifications (i.e., H3K4me2 and H3K27ac). e Bar plot showing the proportion of the sequences with the co-binding of OTX2 and at least one of cell type/context-specific factors, CRX, NEF2D, and NRL (Overlap) and the remaining sequences (Non-overlap)

By integrating the MPRA result with bulk ChIP-seq data of the adult human retina, we observed that the validated enhancers are significantly enriched of the binding of photoreceptor-specific TFs, such a

Comments (0)

No login
gif