Clinical trials for evidence-based radiology: to randomize or not to randomize? This is the question

The goal of every physician is to optimize patient outcome through accurate decision-making. Over the last thirty years, evidence-based medicine (EBM) has been the cornerstone of this decisional process, providing an approach structured on the results of clinical research, i.e. the best “external evidence”, as well as personal clinical expertise and patient’s values and preferences [1, 2]. As stated by Dave L. Sacket in 1996 [3], “evidence-based medicine is the conscientious, explicit, and judicious use of current best evidence in making decision about the care of individual patients. The practice of evidence-based medicine means integrating individual clinical expertise with the best available external evidence from systematic research”. Technically speaking, “evidence-based medicine is the use of mathematical estimates of the risk of benefit and harm, derived from high-quality research on population samples, to inform clinical decision making in the diagnosis, investigation or management of individual patients”, as underlined by Trisha Greenhalgh and Anna Donald in 2000 [4].

The “evidence” is given for a variety of levels, from experts’ opinion (the lowest one, so-called eminence-based medicine) to randomized clinical trials (RCTs) and their meta-analyses (the highest level). The consequence of this ranking was that RCTs are widely regarded as the best research design providing a structure for data collection, mitigating bias, and generating dependable and applicable results. RCTs are typically conducted in controlled environments, involving specific patient populations, well-defined inclusion and exclusion criteria, and standardized interventions [5]. However, RCTs have inherent limitations such as, not assured equalization between intervention and control group, not always precise intervention effect estimate, possible not fully explored effect of covariates, and applicability on strictly selected populations [6]. The last limitation may imply, for instance, difficulties in the case of comorbidities, more and more possible due to population aging. Important to note, RCTs should be conceived as “part of a cumulative program […] to discover not 'what works', but 'why things work” [5]. At any rate, notwithstanding all these limitations, RCTs, when available, provide the evidence for choosing a treatment, including drugs, surgery, and any other treatment type such as interventional radiology or radiation therapy.

We must note that the deep logic of RCTs is based on the difficulty to try two (or more) different treatment/interventions in the same patient at the same time, which would practically erase any variability and confounding factors. In fact, ophthalmologist and dermatologists can administer two different treatments at the same time to the same patient, each of them to one eye or arm/leg. Outside these cases, different treatments/interventions cannot be administered to the same patient at the same time without mutual interference. For this reason, RCTs were thought and are practiced as a surrogate of an intraindividual comparison, which is the theoretical true best way to provide clinical evidence.

Thus, we firstly underline that for diagnostic imaging intraindividual comparison is mostly possible. In fact, the Oxford Center for EBM (http://www.cebm.ox.ac.uk/resources/levels-of-evidence/oxford-centre-for-evidence-based-medicine-levels-of-evidence-march-2009) affirms that the highest level of evidence for studies on diagnostic performance is obtained through well-designed prospective cohort studies and comparative research. As a matter of fact, nobody asks for RCTs showing the advantages of MRI versus CT for diagnosing brain tumors, of CT pulmonary angiography versus radiography for diagnosing pulmonary embolism, or of ultrasound versus clinical examination in the diagnosis of acute cholecystitis.

A special case is that of breast MRI for local and contralateral staging of breast cancer, where surgeons and oncologists sometimes ask for evidence from RCTs considering the underlying issue of risk of overtreatment, i.e. of more extensive surgery than that needed [7]. Here, there is the inherent difficulty of the relationship between a diagnostic procedure and patient’s outcomes: numerous confounding factors come into play, including a large spectrum of surgery, systemic therapy, radiation therapy options, including the variability among surgeons. This complexity makes it challenging to attribute a clinical outcome specifically to a diagnostic test for local and contralateral staging. Insights on preoperative breast MRI are coming from prospective observational data from a very large study [8,9,10].

Conversely, prostate MRI has gained acceptance for the opposite reason, being proven to be a valuable tool in reducing unnecessary biopsies or surgical overtreatment, guiding targeted biopsy, and opening the possibility of MRI surveillance instead of surgery [11,12,13]. Here, the practical impossibility of evaluating MRI-targeted biopsy versus non-MRI-targeted biopsy in the same patient resulted into a RCT, which demonstrated the superiority of MRI-targeted biopsy [14].

Whenever possible, intraindividual prospective comparative studies focusing on sensitivity and specificity are the best way for evaluating the diagnostic performance. Thus, these studies, not RCTs, are the best evidence on which the choice of a diagnostic test must be based. Notably, this reasoning holds specifically for medical imaging applied in the “clinical” context, i.e. for patients who present symptoms or signs (or suspicious findings at a screening test). First level population-based screening tests are a different issue, even if performed with the same imaging modalities used in the clinical context. In fact, in the screening context, we need to eliminate lead time bias (when an early disease detection through screening appears to improve survival rates but only lengthens the time between diagnosis and clinical presentation) and length bias (screen-detected cancers show a lower growth rate than interval cancers). Thus, in case of screening, rigorous RCTs are required, with a sufficiently large sample size to deal with the low incidence, also in high-risk population. RCTs can provide evidence of the effectiveness of screening strategies by comparing outcomes in the screened population against a control group. One recent relevant example is the proposal of breast MRI for screening women with extremely dense breasts [15,16,17].

In conclusion, the type of evidence required depends on the context. For treatment evaluation, RCTs remain the best way. In case of diagnostic imaging (where symptoms or sign are present), intraindividual comparison studies focusing on sensitivity and specificity provide valuable evidence for diagnostic performance. For screening programs, RCTs are essential to eliminate biases and evaluate the societal impact. However, in the current era of big data and artificial intelligence, the position of RCTs as the top level of evidence could be reshaped by the increasing role of the so-called real-world evidence [18], which incorporates data from routine clinical practice and provides additional insights into treatment and diagnostic outcome. The comparison between RCTs and real-world evidence is an interesting perspective to be pursued.

Comments (0)

No login
gif