Brain MRI examinations were retrospectively collected from ongoing research studies at the three hospitals: Lausanne University Hospital (H1; CHUV, Lausanne, Switzerland), Hospital Clínic de Barcelona (H2; Barcelona, Spain) and La Timone (H3; Marseilles, France). Exclusion criteria included twin pregnancies and any pathology or malformation in the fetal MRI scans. The study received ethical approval from each center’s institutional review board (CHUV: CER-VD 2021-00124, Hospital Clínic: HCB/2022/0533, La Timone: Aix-Marseille University N°2022-04-14-003). Fetal examinations were equally distributed across three gestational age (GA) bins representing different stages of fetal brain development: [21, 28) weeks, [28, 32) weeks, and [32, 36) weeks. A flow diagram of included and excluded MRI examinations is shown in Fig. 1.
Fig. 1General description of the study. a Flowchart of our study sample shows inclusion and exclusion. There was a total of 219 pregnant patients who were imaged across three centers. Seventy-four magnetic resonance imaging (MRI) examinations were excluded due to poor-quality reconstruction, resulting in 145 MRI examinations that were annotated and automatically segmented. After selection of subjects in relevant age bins, this resulted in 84 MRI examinations analyzed (27 for ages [21–28), 31 for [28,32), and 26 for [32–36)). b Distribution of gestational ages across the different sites. c Design of the study. The subjects are nested within the raters. The raters considered the subjects from their center (M.K. for CHUV, I.V. for Hospital Clínic and N.G., A.Ma. for La Timone) and performed the measurements on every reconstruction for each subject. NeSVoR, Neural Slice-to-Volume Reconstruction; SVRTK, Slice-to-Volume Reconstruction ToolKit
DataFetal MRI data were acquired with different Siemens Healthineers scanners (Erlangen, Germany) at 1.5 tesla (T) or 3 T across hospitals. The fetal brain MRI protocol included T2w HASTE (half-Fourier acquisition single-shot turbo spin echo imaging) sequences acquired in three orthogonal directions (axial, coronal, sagittal). Details on the different MRI acquisition parameters and number of acquisitions per subject are available in Table 1. There is some degree of heterogeneity in the acquisition, which reflects the variations in clinical practice, as acquisition protocols can vary across imaging centers [1].
Table 1 Metadata regarding the acquisition parameters, the gestational ages of participants, the resolution of the T2w series, and the number stacks used in the reconstruction algorithmData processingAs clinical fetal brain MRI acquisitions feature anisotropic resolution, the data acquired in different orientations are reconstructed into a single, high-resolution volume through super-resolution reconstruction methods. Each subject was reconstructed using three widely used super-resolution reconstruction toolkits: Neural Slice-to-Volume Reconstruction (NeSVoR) (v.0.5.0) [14], NiftyMIC (v.0.9.0) [13], and SVRTK (v.auto-2.2.0) [10]. These pipelines were chosen as they are widely used in the community and are representative of both classical inverse problem approaches [13, 18] and self-supervised deep learning-based reconstruction methods [14]. Depending on the hospital, stacks with high levels of motion or signal drops were excluded through visual inspection [20] and/or automated quality control [23]. At La Timone and Hospital Clínic, stacks were processed with non-local means denoising [24] and N4 bias field correction [25]. Each subject was then reconstructed using the default parameters of the three super-resolution reconstruction methods, at 0.8 mm isotropic resolution. The resulting super-resolution reconstructed volumes were aligned to a standard orientation.
For poor-quality reconstructions, different stack combinations were tested until the image quality was deemed sufficient by visual assessment (no evident artifacts or errors from registration/reconstruction). If no combination resulted in a sufficiently high-quality reconstruction, the subject was excluded from the study.
Biometric measurementsBiometric measurements were performed on both low-resolution 2-D stacks and 3-D super-resolution reconstructed volumes using ITK-SNAP (University of Pennsylvania, PA, USA). Measures were performed on each site by medical experts in obstetrics and/or pediatric image analysis: M.K. (15 years of experience) for CHUV, I.V. (5 years of experience) for Hospital Clínic and N.G (> 20 years of experience), and A. Ma. (5 years of experience) for La Timone. This resulted in a design where subjects are nested within the raters (Fig. 1). Following established guidelines for fetal brain MRI biometry [1, 3, 16, 26], the following measurements were performed: length of the corpus callosum (LCC), height of the vermis (HV), brain and skull biparietal diameters (bBIP, sBIP), and transverse cerebellar diameter (TCD). An example of the measurements on a subject is shown in Fig. 2. These measurements were then compared to the reference values obtained by Kyriakopoulou et al. [16].
Fig. 2Illustration of measurements on a 31-week-old healthy subject using T2w HASTE data. a–c Biometric measurements on a volume reconstructed using the Slice-to-Volume Reconstruction ToolKit (SVRTK). Axial (a): brain and skull biparietal diameters (bBIP and sBIP). Sagittal (b): length of the corpus callosum (LCC) and height of the vermis (HV). Coronal (c): transverse cerebellar diameter (TCD). d–f Automated segmentation using the Brain vOlumetry and aUtomated parcellatioN (BOUNTI) method in axial (d), sagittal (e), and coronal (f) planes. g–i Measurements on the T2w HASTE stacks in axial (g), sagittal (h), and coronal (i) planes. Each column represents a different stack. The stacks were re-oriented for visualization purposes. j–l Through-plane view of the low-resolution images of images (g, h, i), showing the thick slices of the low-resolution acquisitions in coronal (j), axial (k), and sagittal (l) planes. bBIP, brain biparietal diameters; HV, height of the vermis; LCC, length of the corpus callosum; sBIP, skull biparietal diameters; TCD, transverse cerebellar diameter
On the low-resolution stacks, each rater chose the stack best suited (in terms of alignment and image quality) for each measurement. On the 3-D super-resolution reconstructed volumes, raters had the option to re-align (manual rigid transformation) the images prior to performing the measurements. In total, the four different raters each performed around 550 measurements (5 structures × 4 variants (1 low-resolution + 3 super-resolution reconstructions) × 26–29 subjects).
Automated volumetryAutomated volumetric evaluation was carried out on the super-resolution reconstructed volumes using Brain vOlumetry and aUtomated parcellatioN (BOUNTI) [27], a recent deep learning segmentation method. BOUNTI segments the brain into 19 different regions and was trained on a large corpus of manually segmented brains volumes. An illustration of the segmentations is provided in Fig. 2. In our analysis, we considered five volumetric measurements for which reference values are available [16]: extra-cerebral cerebrospinal fluid (extra-cerebral CSF), cortical gray matter (cortical GM), cerebellum, supratentorial brain tissue (ST), and total lateral ventricles. Cortical GM and cerebellum measurements were also compared to the growth curves from Machado-Rivas et al. [28], which used the methods of Kainz et al. [11] to reconstruct the T2w stacks, and automated segmentation with an atlas-based approach [15].
Qualitative assessmentWe aimed at obtaining expert feedback on the appearance, particularly on the aspects of intensity and visibility, of key anatomical structures used to assess fetal development. Four neuroradiologists (N.G., > 20 years of experience; A.Ma., 5 years of experience; M.K., 15 years of experience; M.G.C., 12 years of experience) were asked to qualitatively assess the volumes reconstructed from six subjects using all three super-resolution reconstruction methods considered. The subjects were selected to represent different GA bins (26, 28, 29, 30, 32, and 34 weeks) with high-quality 3-D super-resolution reconstructed volumes for all subjects and methods to avoid any bias. In a first round of evaluation, the clinicians visualized all super-resolution reconstructed volumes from a given subject and were asked to assess how clearly different structures appeared in the super-resolution reconstructed volume. The details of the questions asked and structures rated are available in supplementary Table S9. In a second stage, raters were asked to compare the super-resolution reconstructed volumes from each subject with the corresponding low-resolution stacks of images. They were first asked to rank the three super-resolution reconstructed volumes for each subject based on their likelihood of use (with ties allowed). They were then asked to determine whether they would choose the super-resolution reconstructed volume over the low-resolution stacks for their clinical assessment, and whether the super-resolution reconstructed volume provided more information than the low-resolution stacks for a radiological evaluation.
Statistical analysisA univariate analysis was initially carried out to assess the influence of the super-resolution reconstruction algorithm on the biometric (respectively volumetric) measurements. Due to the non-Gaussian distribution of the data, a Friedman test (the non-parametric equivalent of a repeated measures ANOVA, N=252, degrees of freedom =2) was used to test the difference across super-resolution reconstruction methods. We did not apply corrections for multiple comparisons to detect even small statistical effects related to the super-resolution reconstruction techniques, as correction would make it easier to support our hypothesis. Post-hoc testing was done using pairwise Wilcoxon rank-sum tests, and Bonferroni correction for multiple comparisons was applied at this stage. Effect sizes were reported as \(Z/\sqrt\).
We confirmed these results using multivariate regression to evaluate the impact of super-resolution reconstruction on biometric (resp. volumetric) measurements while accounting for covariates. A t-distributed Generalized Additive Model for Scale and Location (GAMLSS) [29, 30] was fitted with the biometric (resp. volumetric) measurement as the response, the super-resolution reconstruction algorithm as the fixed effect of interest, gestational age (GA) as a covariate, rater as a covariate for the biometry only (as the volumetry is computed automatically), and subject as a random effect.
The choice of a GAMLSS model over a simpler t-distributed linear mixed effect (LME) model was based on visual inspection of the residual distribution (R function fitdistrplus::descdist) and of the cumulative distribution function (R function DHARMa::simulateResiduals). While both the LME and the GAMLSS had a well-aligned cumulative distribution function, the GAMLSS model showed a less dispersed residual distribution, suggesting more stable estimates.
The qualitative analysis relied on a smaller sample. We nonetheless carried out a univariate analysis using a Friedman test (N=72, degrees of freedom =2). When significant results were found, post-hoc analysis testing was done using pairwise Wilcoxon rank-sum tests, with Bonferroni correction for multiple comparisons. All statistical analyses were carried out using the R software version 4.2.2 (R Foundation for Statistical Computing, Vienna, Austria). To facilitate the analysis of the results, the ratings of R3 were used in a confirmatory analysis as part of a supplementary experiment. The analysis then simply has subjects nested within raters.
Comments (0)