Haplotypes affecting stillbirth and fertility in Icelandic Dairy Cattle

Genotypes

Genotype data was acquired from the database of the Farmer’s Association of Iceland. The animals were genotyped with Illumina BovineSNP50 version 2 and 3 or BovineHD SNP chip (Illumina, San Diego, USA) (Gautason et al. 2021b), and the EuroG MD chip (Boichard et al. 2018). The markers were mapped to the ARS-UCD1.2 cattle genome assembly (Rosen et al. 2020) and only autosomal markers were used. We removed SNP markers that were missing in more than 5% of animals and removed animals with call rate lower than 95%. Only animals with registered pure Icelandic Cattle ancestry were used. This resulted in 20,557 animals with 43,143 common markers, 870 of which were monomorphic. For GWAS, we removed animals with heterozygosity 3 standard deviations above the mean, markers deviating from Hardy–Weinberg proportions (P < 1e-6), and markers with MAF lower than 0.01. After these filtering steps, 35,481 markers remained for GWAS.

Phenotypes

Phenotype data was obtained from the database of the Farmer’s Association of Iceland. The data included the pedigree of Icelandic Cattle, calving registrations that were recorded by farmers between January 2004 and August 2023, insemination data registered by AI technicians from 2008 until September 2024, and registered culling reasons. The insemination records and stillbirth records were separately filtered in line with the procedures used in the routine genetic evaluation as described below. Because the data are from separate sources and the traits in question different, the filtering procedures were not exactly the same. The extracted pedigree contained 677,129 animals. We added all stillborn calves to the pedigree and then traced it to animals with records and pruned the pedigree three generations back using DmuTrace Version 2 (Madsen 2010). After this filtering, the pedigree contained 472,743 individuals.

Stillbirths

The birth registrations included the fate of the calf in several categories. The following categories were defined as stillbirths: stillborn, died during calving, and died within 24 h of calving. Abortions were not considered as stillbirths. Each calving was assigned 0 for a dead calf, and 1 for a live calf. We removed twin calvings and used up to four calvings for each dam for GWAS, and up to five calvings for analysis of haplotype effects. We only used calving records for which the cow was more than 19 and less than 37 months old at first calving. These limits were set to remove extreme cases, as cows younger than 20 months and older than 36 months have substantially increased frequency of stillbirths.

Fertility

Before filtering, the dataset contained information on 168,940 animals and 697,309 artificial inseminations performed between 2008 and 2024. Only inseminations of maiden heifers (lactation 0) and the first three lactations (1–3) were included. We used these records to derive four fertility traits, namely conception rate (CR), calving interval (CI), interval from first to last insemination (IFL), and number of inseminations per service period (AIS). Our methods followed those described by Þórarinsdóttir et al. (2021), which are also used in the routine genetic evaluation. Conception rate is a measure of whether a cow conceived in her first insemination or not. CR was defined as a binary trait and computed based on insemination and calving records. CR was defined in a way that if IFL was in the interval 0–4 days and the animal calved 260–302 days later, then CR was set to 1 (success). If IFL was 5 days or more and the animal calved 260–302 days later then CR was set to 0 (failure). CR was also set to 0 if the cow was inseminated at least once but did not calve. Before analysis, the data was filtered to exclude animals with errors in recording of inseminations or calvings. While editing data, intervals were kept if within the following limits: age at first calving (550–1100 days), CI (280–600 days), age at first insemination (270–900 days), interval from calving to first insemination (20–230 days), IFL (0–365 days), interval from calving to last insemination (20–365 days), gestation length (260–302 days), and AIS (1–8). Records for lactations 2 and 3 were excluded if information about lactation 1 was unavailable and records for lactation 3 were excluded if information on lactation 1 or 2 was missing.

We also used insemination records to calculate non-return-rates at 56 days (NRR56). We excluded insemination records if the cow was re-inseminated less than 3 days after previous insemination. Records were also removed if gestation length, that is the interval from first insemination to calving, was shorter than 266 days. A cow was assigned a value of 1 if she was not re-inseminated within 56 days of her first insemination. Otherwise, if the cow was inseminated in the interval from day 3 to day 55, the cow was assigned 0. We further filtered the data to ensure that each fixed effect class contained at least 10 observations for all fertility trait analysed.

Homozygous haplotype deficiency

We used the software findhap version 3 (VanRaden et al. 2011b; 2015) to construct haplotypes from the genotype data. We constructed haplotypes of 75 and 25 markers length. The program was run in three steps, first dividing the genotype data into 600 marker segments, then 212 markers and finally identifying haplotypes with length up to 75 or 25 markers. Haplotypes with frequency lower than 1% for the 75-marker haplotypes and 5% for the 25-marker haplotypes were discarded from further analysis. For each haplotype, we counted the number of homozygous individuals observed in the genotype data and used haplotypes with zero observed homozygous individuals for further analysis. Following VanRaden (2011a), we used two methods to estimate the number of expected homozygous animals, the simple method based on carrier frequency assuming random mating and another approach based on the carrier status of mating pairs. For the simple method, we identified the number of expected homozygous individuals for each haplotype as the squared carrier frequency divided by 4, multiplied with the number of genotyped animals. For the mating method, we used the pedigree to estimate the probabilities of observing homozygous individuals. For each genotyped animal, we checked whether the sire and the dam were genotyped. If both were genotyped, we counted the instances when both sire and dam were carriers. If the dam was not genotyped, we checked whether the maternal grand sire was genotyped. We counted the number of instances when both sire and maternal grand sire were carriers. Then the expected number of homozygous animals was computed as the sum of (carrier sire) × (carrier dam), and (carrier sire) × (carrier maternal grand sire) matings divided by 4. This method assumes that the carrier frequency in maternal grand sires and maternal grand dams is equal. For the simple method, probabilities of observing 0 homozygotes were computed as \(P=^/4\right)}^\) where C was carrier frequency and N was number of genotyped animals. For the mating method, the probability was computed as 0.75 raised to the power of the sum of (carrier sire) × (carrier dam) and (carrier sire) × (carrier maternal grand sire) matings (VanRaden et al. 2011a). The haplotypes received an ID based on their position and haplotype number within position as defined by the software findhap.

Haplotype effectsStillbirth

We used the following statistical model to estimate the effect of HHD haplotypes on stillbirth. We only used calving records where both sire and dam, or both sire and maternal grand sire were genotyped. This resulted in 215,989 calving records. We applied logistic regression using the following model in R (R Core Team 2019) version 4.4.0:

$$\begin_=\mu +_+_+_+\beta \times \left(_\right)+_\end$$

(1)

where \(Y\) was stillbirth, \(\mu\) was the intercept, Parity was parity of the dam taking values from 0 (for maiden heifers) to 4, sex was the sex of the calf, and \(Year\times Month\) was year and month of calving. \(_\) was computed with the following equation if the dam and sire were genotyped:

$$\begin_=0.5_\times 0.5\times _\end$$

(2)

If the sire and maternal grand sire were genotyped, the following equation was used:

$$\begin_=0.5_\times 0.25\times _\end$$

(3)

\(_\), \(_\) and \(_\) took values of 0 or 1 depending on their carrier status.

Rate of insemination failure as a function of carrier status

We did an analysis similar to Kadri et al. (2014) to test the effect of mating type on insemination failure measured as NRR56. The final dataset contained 136,475 insemination records with values for NRR56. We defined four classes of mating depending on carrier (C) and non-carrier (NC) status: (1) NC sire × NC maternal grand sire, (2) NC sire × C maternal grand sire, (3) C sire × NC maternal grand sire, (4) C sire × C maternal grand sire. The probability of both parents to be carriers for the putative lethal allele is (1) 0, (2), 0, (3) p, and (4) \(\frac\), where p is the frequency of the haplotype. The probability of the conceptus to be homozygous is the probability of both parents being carriers multiplied with 0.25: (1) 0, (2), 0, (3) 0.25p, and (4) 0.25\(\frac\). Assuming that the allele is lethal in homozygous state, we expect that reproductive failure is increased by the probability (1) 0, (2), 0, (3) 0.25p(1-f), and (4) 0.25 \(\frac\left(1-f\right)\) (Kadri et al. 2014). The background reproductive failure, f, was computed as the weighted mean of the average NRR56 value for mating types 1 and 2. We used the following linear mixed model to estimate the effects of haplotypes on insemination failure:

$$\begin_=\mu +_+_+_+_+_\end$$

(4)

where μ was the mean, Parity was the fixed effect of parity 0, 1, 2, or 3, and Year was the fixed effect of year of insemination. Mating type was the fixed effect of mating type and took values of 1, 2, 3 or 4 depending on the genotype of the sire and genotype of the sire of the dam. u was the random additive genetic effect of the inseminated cow, with distribution ~ N(0, A\(_^\)) where A was the numerator relationship matrix and e was a random residual effect with distribution ~ N(0, I\(_^\)). We estimated variance components and fit the model using Average information restricted maximum likelihood (AIREML) (Jensen et al. 1997) using the function dmuai of the DMU software package version 6, release 5.6 (Madsen & Jensen 2013). We plotted results using ggplot2 (Wickham 2016) version 3.5.1.

Phenotypes for genome-wide association studies

We used linear mixed models to construct corrected phenotypes for genotyped cows with observations for stillbirth and fertility. We estimated variance components and predicted breeding values (EBV) using AI-REML in DMU.

Stillbirth

To estimate variance components for stillbirth, we applied more strict filtering procedures in addition to those listed in the Phenotype section. We used calving records from 2009 to 2023. We removed all records within fixed year classes with 20 or fewer observations. We only included herds that were present throughout the period to avoid any bias due to management changes for herds starting or leaving production. We constructed a herd-year variable and removed herds for which there were fewer than 4 calvings in any herd-year class. This resulted in 223,179 records for estimation of variance components and 309,345 records in the dataset used to predict breeding values. On average, 26.0% of heifer calvings, and 9.2% of later calvings in the data resulted in stillborn calves. We defined 21 unknown parent groups to model the genetic trend in unknown sires and dams. These groups were fit as random effects in the models. The following linear bivariate model was applied:

$$\begin_^=_+_+_+_+_+_+_+_+_\end$$

(5)

where y is the phenotype of calf i, H5Y is the fixed effect of herd by calving year group, where the groups were defined by five year periods, D is fixed effect of age of dam at calving in months, M is fixed effect of month of calving, S is fixed effect of the sex of calf, HY is the random effect of herd by year of calving, m is the random additive genetic maternal effect of dam, and u is the random additive genetic effect of the calf, PE is the random effect of dam’s permanent environment, and e is the random residual effect. Effects of m and u were assumed to be drawn from.

 ~ N \(\left(0, \mathbf\otimes \left[\begin\begin_^& _\\ _& _^\end& \begin_& _\\ _& _\end\\ \begin_& _\\ _& _\end& \begin_^& _\\ _& _^\end\end\right]\right)\) where \(_^\) and \(_^\) are the additive genetic variances for individual and maternal effects, and A is the numerator relationship matrix. The random effect of HY had distribution ~ N \(\left(0, \mathbf\otimes \left[\begin_^& _\\ _& _^\end\right]\right)\) where I is an identity matrix, the permanent environmental effect was assumed to be drawn from ~ N(0, I\(_^\)), and the residual effect was assumed to be drawn from ~ N \(\left(0, \mathbf\otimes \left[\begin_^& 0\\ 0& _^\end\right]\right)\). The superscript denotes whether the dam gave birth to her first calf (1) or subsequent calvings (2). The phenotype y was set to 0 for stillbirths and 1 for live calves. The solutions for m and u gave predicted breeding values for four traits: daughter stillbirth at 1st calving (DSB1), sire stillbirth at 1st calving (SSB1), daughter stillbirth at calving 2–4 (DSB2) and sire stillbirth at calving 2–4 (SSB2). We computed the phenotype of animal i for stillbirth corrected for environmental effects as the sum of the predicted breeding value, permanent environment (for DSB2), and residual:

$$\begin_^=_+_\\ _^=_+_+ }_\\ _^=_+_\\ _^=_+_\end$$

(6)

where EBV was the predicted breeding value and \(_\) and \(_\) were the residuals according to model 5. For \(_\), we used the mean of the residuals \(\left(}_\right)\) when the cow had more than one calving. This resulted in a total of 9548 and 9313 animals with corrected phenotypes for direct stillbirth and maternal stillbirth, respectively. We computed the heritability as \(_^=\frac_^}_^+_^+_^+2_}\) and \(_^=\frac_^}_^+_^+_^+_}\) and similarly for DSB2 and SSB2.

Fertility

The following linear repeatability mixed model was used for AIS and IFL:

$$\beginy=HY+Age+IYM+Parity+u+PE+e\end$$

(7)

where HY was the fixed effect of herd by year. Year of birth was used for maiden heifers and calving year was used for older cows, Age was fixed effect of age of cow in months. Age at first insemination was used for heifers and age of calving was used for older cows. IYM was the random effect of insemination year-by-month, u was the random additive genetic effect of the animal with distribution ~ N(0, A\(_^\)), PE was the permanent environmental effect of the cow, with distribution ~ N(0, I\(_^\)), and e was a random residual effect with distribution ~ N(0, I\(_^\)). For CR, we used the model in Eq. (7) but also included the fixed effect of technician at insemination. For CI, we used the following model:

$$\beginy=HY+Age+Parity+u+PE+e\end$$

(8)

where HY was herd by year of calving and Age was age at calving in months, and other terms were the same as above. We used the same pedigree and unknown parent groups as for stillbirth. We then computed the corrected phenotype for these traits as the sum of EBV, PE, and mean residual:

$$\begin_^=_+_+ }_\end$$

(9)

where \( }_\) was the mean residual for animal i. We used 6406, 6410, 6407, and 5591 animals for IFL, AIS, CR, and CI. We also constructed a phenotype for the trait infertility. We obtained records of cows that had never calved and were registered as having been culled due to infertility. These cows were assigned 0 and cows that had calved were assigned 1. There were 107 cows assigned 0 and 10,106 assigned 1. We did not predict breeding values for infertility but used the raw phenotype for GWAS.

Genome-wide association analysis

The genome-wide association studies were conducted using the software LDAK version 5.2 (Speed et al. 2012). We used a linear mixed model regression for all traits:

$$\begin_^=\mu +b_+_+e\end$$

(10)

where \(_^\) was phenotype of animal i as described above, b was the allele substitution effect of the SNP, g is the number of copies of the allele in animal i, u is the polygenic effect of individual i with distribution ~ N(0, G\(_^\)) where \(_^\) is the polygenic genetic variance and G is the genomic relationship matrix, and e is the random residual for animal i, with distribution ~ N(0, I\(_^\)).

Genome-wide significance threshold was set to 1.41e-6 with a Bonferroni correction for 35,481 tests. We used the arguments –calc-kins-direct GRM –ignore-weights YES –power −1 with LDAK to set up the genomic relationship matrix using 35,481 markers and 20,328 animals. We used the arguments –linear and –grm to run the GWAS. We considered an association suggestive if it was among the 10 markers with the lowest P-values for each trait even though the association had not reached the significant threshold after Bonferroni multiple testing correction. We checked whether each of these suggestive markers was within 1 Mb of an HDD haplotype, and whether it was in linkage disequilibrium (LD) with that of HDD haplotype. We used PLINK (Purcell et al. 2007) to compute LD and assumed that r2 greater than 0.5 indicated linkage. For markers co-locating with HHD haplotypes, we identified the protein-coding genes closest to the marker according to the ARS-UCD 1.2 genome build on ensembl (Yates et al. 2020). We plotted the P-values by chromosomes and the expected and observed distribution of P values using the R package ggmanh (Lee & Lee 2024).

Search for chromosomal deletion

To look for candidate deletion regions, we computed probability of deviation from Hardy–Weinberg proportions using the –hardy command in PLINK. For each HHD region, we plotted P-values for deviation from Hardy–Weinberg proportions, and the mean log R values of carriers and non-carriers, to look for indications of large deletions.

Identification of HHD haplotypes

We present HHD haplotypes that fulfilled one or more of the following criteria:

(1)

Had a statistically significant effect on stillbirth according to Eq. (1).

(2)

Had significant effects on rate of insemination failure according to Eq. (4).

(3)

Co-located with an association according to GWAS as described above. A non-significant association was considered suggestive if markers were among the top 10 SNPs with lowest P-values.

Comments (0)

No login
gif