Inter- and intra-observer variability of software quantified bowel motility measurements of small bowel Crohn’s disease: findings from the MOTILITY trial

In this prospective, multicentre study of 104 SBCD patients, we found moderate to good interobserver agreement for mMRI-derived measures of segmental small bowel motility, both for radiologists inexperienced and experienced in mMRI. Furthermore, intra-observer agreement was also moderate for two mMRI-experienced radiologists. Bland–Altman analysis and data scatter plots also generally support translation of mMRI to clinical practice with clinically acceptable reproducibility and potential to assess disease activity, for example, when assessing therapeutic response.

Currently, radiologists rely on anatomical MRE observations such as mural thickness and mural and perimural oedema, when making therapeutic response assessments [2, 3, 5,6,7]. However, recent attention has focussed on functional MRE variables, particularly bowel motility and its incremental value for disease assessment. Quantified terminal ileal motility is more sensitive for disease activity than the MaRIA score when judged against endoscopic and histopathological CD reference standards [12, 13, 15, 19]. Furthermore, informing the design of the MOTILITY trial, initial data suggested mMRI may be better able to capture early response to biologic therapy than morphological observations [13, 20]. It is vital for clinical utility that any promising novel imaging biomarker shows adequate reproducibility within and between readers.

When morphological observations such as bowel wall thickness and T2 signal are combined into disease activity scores, there is relatively good inter- and intra-reader agreement; in 50 MRE studies analysed three times each by four experienced radiologists, Jairath et al found an inter-rater ICC of 0.67–0.71 and intra-rater ICC of 0.87–0.89 for the MaRIA, extended London and London activity scores [14]. However, such activity scores are not used routinely in clinical practice, where radiologists prefer subjective assessment. Notably, the study by Jairath et al, scores for individual anatomical metrics (e.g., mural thickness and mural T2 signal) showed lower agreement.

Conversely, there is relatively sparse published data regarding mMRI inter- and intra-observer agreement, predominantly using a few readers and MRI datasets. Plumb et al found good agreement between two readers at both baseline and follow-up mMRI for SBCD (ICC = 0.65, p < 0.001 and ICC = 0.71, p < 0.001, respectively) in a single centre, predominantly retrospective study of 46 patients [13]. Dillman et al investigated mMRI in a paediatric and young adult cohort of 20 newly diagnosed SBCD patients starting anti-TNFα therapy and 16 healthy control participants, interpreted by an experienced radiologist but without prior mMRI experience, and a non-medical operator [19]. Terminal ileal motility improved in response to therapy at 6 weeks and 6 months, reported an ICC of 0.89 (95% CI: 0.83–0.93). A study of bowel motility of 15 healthy volunteers found segmental mMRI measurements by one experienced and one inexperienced reader had an ICC of 0.979, p < 0.0001 and Bland–Altman limits of agreement 95% CI: −28.9 to 45.9 AU), with an ICC 0.992 and 0.960, p < 0.0001) for intra-observer agreement [21].

In the present study, we also found moderate levels of agreement with an intraclass correlation coefficient of 0.59 to 0.70. Specifically, we found that both experienced and inexperienced radiologists exhibited moderate interobserver agreement for segmental small bowel motility, which was maintained when combining the experienced readers’ scores with those of the inexperienced radiologists. Intra-observer agreement for the two mMRI-experienced radiologists was also moderate, although there were wide 95% CI due to a relatively small number of datasets used for this part of the analysis.

Whilst interobserver agreement was apparently higher between readers without experience of mMRI than between those with, the number of measurements made by the experienced readers was almost double that of the inexperienced readers, liking increasing precision around the estimate. Furthermore, the mean MRI motility score from the 30 randomly selected patients testing agreement between inexperienced readers was relatively low, suggesting these datasets included more active (and therefore immotile) disease. ROI placement is easier and less subjective when the bowel is immotile, compared to less inflamed (and more mobile) segments. Indeed, while ICC is commonly used to assess reader agreement, Bland–Altman and raw scatter plots are often more informative as to whether agreement is clinically acceptable, which is dependent on the intended use for the tool. In the present study, the Bland–Altman analysis and scatter plots suggest agreement is lower when bowel with is more motile, usually reflecting normal (responding) bowel; typical mean value of > 220 AU [21]. Further evidence for this observation is the pattern of intra-observer agreement between mMRI experienced radiologists; the datasets of one reader had a low mean motility (and more active disease) with tighter intra-observer agreement than the other. Overall, it is reassuring that agreement was clinically acceptable in the typical range of active disease (< 220 AU), and given that treatment response is predicated by improved motility scores, increased disagreement for bowel approaching normality has less clinical impact.

Our study has several strengths. We included radiologists experienced in MRE interpretation, but not necessarily mMRI, as these are more representative of clinical practice. Our sample size was informed by a power calculation, and a priori, we defined a protocol for ROI placement. Furthermore, to mirror clinical practice, radiologists were provided with limited anatomical sequences to help guide ROI placement and instructed on which small bowel segment to place the ROI. The prospective nature of our study also meant that MRI acquisition protocols could be standardised. While we present ICC data, we also performed Bland–Altman analysis and provide raw scatter plots to better communicate the clinical acceptability around the levels of agreement. Such provisions suggest our results will generalise to standard clinical practice.

There are also some limitations. It is possible that case mix (e.g., disease location and phenotype), influenced mMRI measurements. However, this prospective study included multiple patients from 13 centres (and readers from 3 different centres) and therefore is likely representative of typical clinical practice. While there are many other variables that can be captured with mMRI, such as bowel contractile magnitude and frequency, we focused on one motility metric based on the standard deviation of the Jacobean, as it has a strong evidence base, is simple to perform, and for clinicians and patients to interpret. An ongoing multicentre study is directly assessing the real-world management impact of this single mMRI-derived metric in SBCD on clinical decision making, for both radiologists and gastroenterologists (CONTEXT trial, REC: 21/PR/0592).

In summary, we found that there was moderate to good interobserver agreement for mMRI-quantified segmental small bowel motility in both mMRI-experienced and inexperienced readers. We also found moderate intra-observer agreement in mMRI-experienced readers. This level of mMRI reproducibility is comparable to that of standard MRE morphological variables used in clinical practice. Agreement was best when the bowel was less mobile, i.e., abnormal, which, given the intended use of mMRI, overall supports the ongoing clinical translation of mMRI as a biomarker of disease activity and treatment response in CD.

Comments (0)

No login
gif