Allele co-segregation and haplotype diversity of MHC IIβ genes in the small-spotted catshark

Sampling and DNA extraction

We used two groups of S. canicula adult individuals and their progeny kept in aquaria under controlled conditions. The groups differed in source population and number of putative parents and offspring as follows: (1) group 1 included a single mother captured in the wild off Lisbon, Portugal (western Iberian waters), and her 21 offspring, and was held at the Laboratório Marítimo da Guia — University of Lisbon (the number of fathers was unknown); (2) group 2 included seven randomly mating adult individuals (3 fathers and 4 mothers) and their 47 offspring and was held at the Ozeaneum, Stralsund, Germany. Fin clips were taken from each individual and stored in 98% ethanol at − 20 °C, until DNA extraction and processing. Genomic DNA (gDNA) was extracted using the Molecular Biology EZ-10 Spin Column genomic DNA minipreps Kit (Bio Basic Inc.) for group 1 samples, whereas group 2 samples were provided as gDNA extractions performed with the Isolate II Blood and Tissue kit (Bioline).

Microsatellite genotyping and family reconstruction

To ascertain the parent–offspring relationships and reconstruct the respective families, all individuals from the two groups of parent–offspring samples were genotyped at 11 microsatellite loci as described by Griffiths et al. (2011). The microsatellite loci were amplified through a single multiplex polymerase chain reaction in a final volume of 10 μl, including: 1 μl of primer mix (Table S1), 1 μl gDNA, 3 μl of autoclaved distilled water and 5 μl of Multiplex Master mix Qiagen (containing the HotStarTaq DNA Polymerase and the multiplex PCR buffer; Qiagen NV, Venlo). The thermocycling conditions consisted of an initial denaturation step at 95 °C for 15 min, followed by 35 cycles of denaturation at 94 °C for 30 s, annealing at 60 °C for 90 s and elongation at 72 °C for 45 s and a final extension step at 72 °C for 30 s.

Family reconstruction and parentage analyses were performed for group 2 individuals using Cervus (3.0 version by Field Genetics Ltd) which calculates allele frequencies for a given set of microsatellite loci directly from the genotyped individuals and subsequently simulates genotypes and calculates their likelihood ratios (using the likelihood equations of Kalinowski et al. 2007). The likelihood ratios are usually expressed as LOD scores (natural log of the likelihood ratios). Analyses were performed to assess maternity, paternity and parent-pair relationships for all the offspring and to infer the number of offspring for each parent independently and for each parental pair. All the relationships that were consistent across independent analyses and with a positive LOD score were considered as valid.

MHC IIβ amplification, library preparation and Illumina sequencing

Genotyping of the MHC IIβ genes was performed for all reconstructed families of S. canicula (as described above) using the two-step PCR amplification, library preparation and sequencing protocols described in Gaigher et al. (2023). Briefly, amplification of the exon 2 (β1 domain) of MHC IIβ loci was performed using two primer pairs: NF2-NR2 was used to co-amplify alleles from lineages A and B, and DF2-DR2 was used to amplify lineage C alleles (Table S2). The resulting amplicons were cleaned with AMPure XP Beads (0.97x) (Beckman Coulter™ Agencourt) prior to the second PCR step, which adds unique barcodes (MK indexes, Meyer and Kircher 2010; Kircher et al. 2012) and Illumina adapters to each cleaned amplicon. The indexed PCR amplicons were cleaned again using AMPure XP Beads (0.8 ×) and checked for quality on 2% agarose gel electrophoresis and for quantity using the BioTek Epoch microplate spectrophotometer (Agilent Technologies, Inc.). Samples were pooled equimolar into primer pair-specific libraries (i.e. NF2-NR2 and DF2-DR2 libraries) and normalized to 20 nM. The libraries were tested for quality, concentration, size and integrity using the 2200 TapeStation System (Agilent Technologies Inc., Santa Clara, USA) and validated using the KAPA Library Quantification Kit (KAPA Biosystem, Inc., Wilmington, USA) for Illumina sequencing platforms, following the manufacturer’s protocol. The two libraries were combined using a 2:1 ratio for NF2-NR2:DF2-DR2, in order to have proportional read coverage between lineages A and B (NF2-NR2) and lineage C (DF2-DR2) alleles. The final library was sequenced with a MiSeq Reagent Kit v2 250PE at the Centre for Molecular Analysis in CIBIO-InBIO (Vairão, Porto, Portugal), using 20% PhiX. Reliability of the sequencing was evaluated by including 20 sample replicates, in addition to PCR blanks.

Illumina data processing and MHC genotyping

The detailed filtering protocol can be found in Gaigher et al. (2023). Briefly, raw reads were demultiplexed using a custom made Perl script and saved as FASTQ files. Quality and size filtering of reads as well as adaptor trimming were performed with Cutadapt (Martin 2011). Filtered reads were further processed with DADA2 pipeline (Callahan et al. 2016) as follows: (i) primer trimming from forward and reverse reads, (ii) dereplication of identical reads into unique sequences, (iii) merging paired reads based on full agreement in the overlapping region, (iv) removal of potential chimeric sequences and (v) extraction of the final amplicon sequence variant (ASV) table. Further filtering of the retrieved ASVs was performed to reduce the number of artefacts. First, samples with a final coverage < 100 sequences and variants with a maximum coverage < 10 were removed from the dataset. Second, the ASVs were aligned in Geneious Prime v22.1, and variants differing from the targeted loci were discarded. Third, variants with frequencies < 1% (per-amplicon) were automatically considered as artefacts and removed. As artefacts at higher frequency may still remain in the final ASV table, the final classification of variants as artefacts was based on the following assumptions: (i) variants should amplify similarly across samples (but see Sommer et al. 2013), (ii) artefacts should be less frequent than true alleles and (iii) chimeric sequences or sequences with single base pair mismatches should co-occur with their parent sequences in the same sample (Sommer et al. 2013; Lighten et al. 2014; Biedrzycka et al. 2017; Rekdal et al. 2018; Gaigher et al. 2018). Therefore, variants with low frequencies that are found at higher frequencies in other amplicons (defined as true variants/alleles) were treated as artefacts (Gaigher et al. 2023). Chimeras and single base pair substitutions compared to true alleles were removed (Gaigher et al. 2023). Once the final sample dataset and the putative true alleles have been defined, we proceeded to lineage attribution for the co-amplified MHC IIβ-A and MHC IIβ-B alleles based on diagnostic nucleotide sites and phylogenetic networks of β1 (exon 2) and β2 (exon 3) (Gaigher et al. 2023). Allele assignment is therefore performed at the lineage-level only.

Assessment of allele co-segregation and haplotype diversity

We used the pattern of allelic segregation within families to reconstruct MHC IIβ haplotypes. The offspring’ haplotypes were inferred assuming that the maximum number of alleles can differ between individuals (as showed in Gaigher et al. 2023). In addition, due to the occurrence of multiple paternity and the potential for sperm storage in S. canicula (Griffiths et al. 2012), offspring sired by the same mother may present alleles deriving from different putative fathers. Due to the specific mating system of S. canicula, families were defined by the mothers.

From the resulting haplotypes, we investigated the hypothesis of allele co-segregation (linkage) among MHC IIβ lineages in S. canicula given their physical proximity in the genome (Gaigher et al. 2023). For instance, assuming four different alleles (two loci) in each heterozygote parent, we expect to observe a maximum of 16 different haplotypes in the offspring if alleles are segregating independently. However, we expect a maximum of only four different haplotypes in the offspring if two loci are tightly linked. Following this rationale, we deduced the frequency of recombinant haplotypes in our family data, which reflects the linkage between loci. To avoid any bias in detecting recombination events, only parents with different MHC IIβ allelic compositions and with a minimum of five offspring were considered for analysis. The condition of parents with only different alleles was applied only to MHC IIβ-C, as the lineage-specific amplification automatically results in 100% allelic coverage in the offspring if the parents have a unique and similar allele (the offspring can have only one or two identical alleles, the coverage remains the same).

High levels of polymorphism within the MHC region can result in allele- or locus-bias amplification when using a unique pair of primers to co-amplify two or more loci. Consequently, a missing allele or locus can be due to a methodological bias; thus, new lineage-specific primer pairs were designed to confirm allele identity and exclude possible bias amplification of the original primer pairs. Specifically, five new primer pairs were designed targeting both the β1 (exon 2) and the β2 (exon 3) domains of the MHC IIβ genes; Fig. S1; Table S2). Each exon was amplified using a master mix with a 5-µl total volume, including 2.5 µl of MyTaq HS Master mix, 1.5 µl of autoclaved water, 0.2 µl of each primer (10 µM) and 0.6 µl of the gDNA. The thermocycling conditions consisted in an initial denaturation step at 95 °C for 3 min, followed by 35 cycles of denaturation at 95 °C for 30 s, annealing at each primer-pair Ta (Table S2) for 30 s, extension at 72 °C for 30 s and a final extension step at 60 °C for 10 min. The obtained amplicons were cleaned of excess primers and dNTPs with 1 µl of ExoSap-IT™ (Thermo Fisher) and processed for Sanger sequencing of the forward and reverse strands using the BigDye™ Terminator v3.1 Cycle Sequencing Kit following the manufacturer’s instructions. All products were sequenced on an ABI Prism 3130 Genetic Analyzer. The final Sanger sequences were manually inspected and edited in Geneious Prime and aligned to reference alleles attributed to each MHC IIβ lineage (A, B and C; Gaigher et al. 2023). The allele and lineage presence and identities were confirmed when different primers targeting the same exon yield similar results.

Genetic diversity at the sequence and allelic levels within each of the reconstructed MHC IIβ haplotypes was calculated using different metrics: (i) nucleotide p-distances, (ii) amino acid p-distances and (iii) amino acid functional distances. While p-distances between alleles within haplotypes were estimated with MEGA11 (Tamura et al. 2021), the functional distance was calculated with Grantham's distance considering the physicochemical properties of the respective amino acids using a Perl script from Pierini and Lenz (2018).

Genetic architecture of MHC IIβ

To gain further insights into the genetic architecture of MHC IIβ allelic lineages in S. canicula, and based on observations from the reference genome (sScyCan1.1, GenBank accession no. GCF_902713615.1), we explored the hypothesis that alleles in each allelic lineage were derived from distinct MHC IIβ loci. Specifically, we expect that allelic lineages derived from distinct loci should exhibit divergent untranslated regions (UTR; e.g. Okamura et al. 1997). To test for this, full transcripts from the three MHC IIβ genes in the reference genome, belonging to each of the three allelic lineages, were obtained from NCBI (Accession no.s XR_005462827, XM_038815839 and XM_038816327, for lineages A, B and C, respectively), and their 5′ and 3′ UTR regions were aligned in Geneious Prime® 2023.0.1 using the MUSCLE algorithm. Also, if allelic lineages segregate in distinct loci, the number of alleles per lineage per individual should never exceed two and should be independent between lineages. These conditions were assessed using data on the allele composition at MHC IIβ genes for 25 unrelated individuals of S. canicula obtained in Gaigher et al. (2023).

Comments (0)

No login
gif