Self-reported birthweights in the UKBB ranged from 0.45 kg to 8.00 kg, with a mean of 3.35 kg and a standard deviation of 0.65 kg. When we restricted the range of birthweight from 2.50 kg to 4.50 kg, the mean birthweight was 3.39 kg with a standard deviation of 0.42 kg. ESM Fig. 1 shows the distribution of birthweights using these two selection schemes. Both distributions were approximately normal (assessed by eye), with the full sample showing a right skew with birthweight values as large as 8 kg.
GWAS of birthweight in the UKBBESM Figs 2–6 show Q–Q plots and genomic inflation factors for each of the GWAS (λ between 1.11 and 1.43; ESM Table 3). However, LD score regression intercepts (ESM Table 3) were all between 1.02 and 1.12, suggesting that the majority of the inflation in the λ scores was due to genuine polygenic signals. The GWAS of winsorised birthweight was the only GWAS with an LD score intercept above 1.1, consistent with a slight inflation. Manhattan plots for all GWAS can be found in ESM Figs 7–11. Variants at 15 loci reached genome-wide significance across the two case–control GWAS (ESM Table 4, ESM Figs 7, 8). This small number of significant loci contrasts with the high numbers detected in the continuous GWAS (see below). In addition, SNPs at all but one of the loci in the case–control GWAS had been previously associated with birthweight, suggesting that dichotomising the phenotype added little in terms of locus discovery compared with continuous GWAS, although a handful of loci did attain lower p values in the high birthweight GWAS (ESM Table 4). The lead SNP at the one novel locus from the high vs normal birthweight GWAS (p=9.9×10−9), rs67254669, is a physically genotyped, low-frequency (MAF=0.001) missense variant in the ABCC8 gene. This SNP was also significantly associated with birthweight in two of the continuous birthweight GWAS, when we included all individuals (β=0.199 kg per addition of the minor G allele, SE=0.026, p=4.2×10−14) and when we winsorised the distribution (β=0.170 kg per addition of the minor allele, SE=0.022, p=1.3×10−14), and had a very large effect size. No other variant in the region reached genome-wide significance, potentially due to its low frequency and lack of LD with surrounding markers. Nevertheless, the low-frequency allele also showed nominally significant association with increased risk of gestational diabetes mellitus and type 2 diabetes in publicly available FinnGen data [33] (gestational diabetes: logistic β=0.619 per addition of the minor G allele, SE=0.166, p=2×10−4; type 2 diabetes: logistic β=0.299 per addition of the minor G allele, SE=0.070, p=2×10−5), and decreased (inverse rank normal transformed) glucose levels (p=1.9×10−4), but not type 2 diabetes, HbA1c or offspring birthweight (all p>0.05), in publicly available GWAS summary statistics from the UKBB published on the Neale website (https://www.nealelab.is/). The variant was not available in the publicly available deCODE summary results GWAS statistics for birthweight.
Figure 1 presents −log10p values from the three continuous GWAS in the present study for 196 SNPs robustly associated with birthweight, which were previously identified/confirmed in the deCODE study and present in the current study [17]. The graphs clearly show that the EGG Consortium strategy of performing GWAS on the truncated distribution of birthweight reduces the signal at these known variants on average. This is despite the likely presence of ‘winner’s curse’ in the selection of the 196 variants (i.e. the deCODE paper used EGG Consortium data where the distribution of birthweight in the UKBB was truncated, and so variant selection is biased towards those variants that do well in the truncated GWAS), which is likely reflected in the more similar performance of the strategies in that part of the p value distribution close to the cut-off for genome-wide significance (where the effect of winner’s curse will be greatest). In contrast, winsorisation performed the best on average amongst the three strategies in terms of maximising the signal at these known loci. The implication is that the winsorising strategy is also likely to perform better in terms of identifying novel loci. Consequently, we focus on presenting the results from these analyses in the main part of the paper.
Fig. 1The −log10p values of genome-wide significant SNPs from the deCODE GWAS of own birthweight in the full birthweight GWAS vs the truncated GWAS (a), the full birthweight GWAS vs the winsorised GWAS (b) and the truncated GWAS vs the winsorised GWAS (c). In (a), 117 SNPs had lower p values in the full birthweight GWAS and 79 had lower p values in the truncated GWAS. In (b), 180 SNPs had lower p values in the winsorised GWAS and 16 SNPs had lower p values in the full GWAS. In (c), 150 SNPs had lower p values in the winsorised GWAS and 46 SNPs had lower p values in the truncated GWAS
The GWAS of the winsorised birthweight distribution resulted in 270 lead SNPs at 178 loci reaching genome-wide significance (ESM Table 5, ESM Fig. 9), compared with only 120 lead SNPs at 94 loci when analysing birthweights between 2.5 and 4.5 kg (ESM Table 6, ESM Fig. 10), and 186 lead SNPs at 143 loci when analysing the full distribution of birthweights (ESM Table 7, ESM Fig. 11) (there were also a small number of SNPs that were significant in the truncated GWAS/full sample but that were not significant in the winsorised sample). This included 27 variants that were not within ±500 kb of a SNP reaching p<5×10−8 in the previous EGG or deCODE birthweight GWAS (Table 1, ESM Table 8, ESM Figs 12–38). Of the 27 variants at these new loci, we note that nine of the SNPs had stronger evidence of association in the larger deCODE study (compared with the truncated UKBB results), six had less strong evidence and 12 were not reported. Additionally, several have been previously associated with cardiometabolic and/or anthropometric phenotypes at genome-wide levels of significance, and so represent good candidates for genuine associations with birthweight (ESM Table 8). Interesting variants include those in ABCC8 (discussed above) and a variant in a long non-coding RNA that contains antisense instructions for the gene SLC16A1. The robustness of all these associations will need to be confirmed in future GWAS.
Table 1 Novel loci identified in the continuous GWAS analysis using the winsorisation of birthweight methodOne of the reasons for excluding extreme birthweight measurements was to avoid detecting loci that were primarily associated with gestational age rather than birthweight. In the case of the dichotomous GWAS, we found that SNPs at three genome-wide significant loci exhibited nominal associations with the maternal and/or fetal GWAS of gestational age (p<0.05, variants at ADCY5 [both], AMZ1:GNA12 [maternal] and LINC00880 [fetal]) (ESM Table 4). For the winsorised GWAS of birthweight, we found that 24 of the sentinel genome-wide significant SNPs were also nominally associated with own gestational age and 38 with maternal gestational age (5×10−8<p<0.05; ESM Table 5), including one (at RP11-542A14.1) that was also genome-wide significant in the maternal gestational age GWAS. Of the 27 variants at loci detected with the winsorisation method and deemed to be novel (Table 1, ESM Table 8), 11 were available for analysis with mtCOJO. Most of these SNPs showed a slight attenuation in their p value compared with the birthweight GWAS; however, evidence for association with birthweight remained strong.
Genetic correlationsWe performed bivariate LD score regression analyses to investigate the degree of genetic similarity between low birthweight, high birthweight and birthweight within the normal range (i.e. from the truncated birthweight GWAS). We found that high birthweight was strongly genetically correlated with birthweight within the normal range (genetic correlation coefficient [rG]: 0.91; 95% CI 0.83, 0.99; Fig. 2, ESM Table 9), whereas the magnitude of the genetic correlation between low birthweight and birthweight in the normal range was slightly lower (rG: −0.74; 95% CI 0.66, 0.82; Fig. 2, ESM Table 9). In addition, the low birthweight trait exhibited an increased SNP-based heritability (hSNP2) compared with the other traits (hSNP2=0.26 for low birthweight, hSNP2=0.03 for high birthweight, hSNP2=0.11 for both all birthweights and winsorised birthweight and hSNP2=0.08 for truncated birthweight) (ESM Table 3), despite fewer loci reaching genome-wide significance. Low birthweight was moderately positively genetically correlated with many cardiometabolic traits (coronary artery disease, type 2 diabetes, systolic and diastolic blood pressure etc.), whereas high birthweight showed mostly low, non-significant negative genetic correlations with the same traits and positive genetic correlations with adiposity and anthropometric traits (height, BMI, obesity, waist and hip circumference etc.) (Fig. 2, ESM Table 10).
Fig. 2Genetic correlation (rG) between either high (triangles) or low (circles) birthweight and cardiometabolic-related phenotypes. The colour scale represents the strength of genetic correlation from −1 (dark blue) to 1 (dark red). A genetic correlation of exactly zero would be shown as white. BW, birthweight; T2D, type 2 diabetes
Comments (0)