A review and perspective on the neural basis of radiological expertise

INTRODUCTION

“Tai Shang Yi Fen Zhong, Tai Xia Shi Nian Gong” is a Chinese proverb that, when translated literally, says, “1 minute on the stage necessitates 10 years of practice off the stage.” Put more plainly, it takes tremendous time and effort to acquire the knowledge and skills necessary to perform at the expert level. Ericsson et al. work on the role of sustained, deliberate practice in expertise supports this ancient and intuitive idea.[1] This idea has since spread to the popular scientific literature, with an example being Malcolm Gladwell’s Outliers, which is often credited for spreading the 10,000-h rule for acquiring expertise to the popular audience.

Kundel et al. examined radiologists’ eye movement during pulmonary nodule detection and developed a model for nodule detection that includes four steps: orientation, scanning, pattern recognition, and decision-making.[2] In addition, prerequisites to a radiological interpretation include knowledge of radiological anatomy, knowledge of pathology, the imaging modality used, and viewing conditions.[3] van der Gijp et al. combined these aforementioned factors and devised a three-component model of radiological image interpretation consisting of perception, analysis, and synthesis. Perception was defined as the “identification of radiological findings” analysis was defined as the “examination of the features of radiological findings” and synthesis was defined as “the combination of radiological and clinical findings into a conclusion about the differential diagnosis and patient management.”[4] While there has been a myriad of studies about radiological expertise and error,[5-7] the precise underlying neural mechanism still remains largely unexplored.

FUSIFORM FACE AREA (FFA)

The FFA of the brain has been shown to take part in face processing.[8] In addition, some believe that the FFA is responsible for holistic processing that experts often employ. Bilalic et al. demonstrated that the FFA is capable of differentiating between X-rays from other stimuli and that in radiologists, the FFA activation patterns allow for distinguishing X-ray stimuli from other stimuli, providing support for the aforementioned idea of FFA being responsible for holistic processing in experts.[9] Furthermore, work from Kok et al. demonstrated that radiologists have increased right FFA activation during what the authors called “holistic mode (2s trials)” and a lesser degree of FFA differential activation between radiologists and non-radiologists during the “search-to-find mode (10s trials),” providing another empirical study that supports FFA’s role in visual stimuli in one’s field of expertise.[10] However, Engel et al. agreed with the idea that FFA is implicated in radiological expertise, but they argued that such expertise does not depend on the holistic processing of images.[11] Whether FFA activity depends on holistic processing or not is not clearly settled, but the existing evidence supports FFA’s role in radiological expertise. FFA activity has been shown to be determined by the amount of working memory demand.[12] In car and bird experts, the right FFA and occipital face area showed expertise effects.[13] Xu expanded on Gauthier’s studies using side views of cars and birds to avoid having stimuli that could potentially resemble faces and found similar right FFA activation, providing further evidence of right FFA activity in visual expertise.[14] In expert radiologists, FFA has been shown to have increased activity while the left lateral occipital cortex has less activity,[15] providing the most direct evidence for FFA’s role in radiological expertise. FFA’s roles in working memory and radiological expertise could potentially provide an explanation for the neural basis of radiological expertise, but more investigation is needed to establish the link between FFA, working memory, and radiological expertise.

WORKING MEMORY’S INTERACTION WITH LONG-TERM MEMORY

In his 1956 seminal paper on working memory, Miller proposed the 7 ± 2 rule for the number of chunks of information people are able to hold in their working memory.[16] While subsequent work has led to a more critical and nuanced approach to working memory, a detailed account is beyond the scope of this article. Working memory is often defined as “the retention of small amount of information in a readily accessible form.”[17] In experts, working-memory-related tasks demonstrate activations in brain regions implicated in long-term memory tasks, which are not seen in the novice.[18] Furthermore, Guida et al. noted that functional magnetic resonance imaging (fMRI) studies in novices undergoing training showed decreased activation of brain regions implicated in working-memory tasks. They hypothesized that the initial decreased brain activation in the working-memory region in novices undergoing training is the initial stage of brain functional reorganization. This eventually leads to part of long-term memory being recruited as working memory after extended training and acquisition of expertise, providing an explanation consistent with both Ericsson and Kintsch’s long-term working memory theory[19] and Gobet and Simon’s template theory.[20] Pesenti et al. studied calculating prodigies with positron emission tomography and demonstrated mental calculation expertise that was due to the use of episodic (long-term) memory rather than increased brain activity in regions used by novices.[21] While there is currently no work studying whether long-term memory “borrowing” is implicated in radiological expertise, this paradigm could be part of the radiological expertise story.

PREDICTIVE CODING/TOP-DOWN PROCESSING

Both top-down and bottom-up processing are present in radiological interpretation.[22] The intraparietal cortex and superior frontal cortex are part of the top-down processing system and are influenced by stimuli detection.[22] Predictive coding initially came from the signal processing realm but has since been expanded and adopted by the neuroscience community as a potential mechanism for how the brain carries out probabilistic inference.[23,24] Linear predictive coding, predictive coding in the retina, Rao and Ballard’s algorithm, predictive coding/biased-competition-divisive-input-modulation, and free energy all fall under the umbrella term of predictive coding.[23] Based on the framework of minimizing error or free energy in fMRI studies, activity in the orbitofrontal cortex and the hippocampus has been shown to be correlated with “high-level” predictions that compare the expectation and stimulus. In contrast, “low-level” predictions correlate with activities in the retinotopic visual cortex.[25] In addition, Egner et al. demonstrated that the neuronal responses on the populational level were modulated by expectation and surprise and not just by the stimulus features themselves using fMRI.[26,27] Given the heavy reliance on visual perception during radiological interpretation, evidence regarding expectation and response that are specific to the visual cortex would be of particular interest. Perceptual expectation decreases the response amplitude in the primary visual cortex (V1) while improving stimulus representation, suggesting that the expectation is sharpening the representation in the primary visual cortex.[28-31] Unsurprisingly, other areas of the brain such as the FFA,[32] frontal cortex,[33] and V2 and V3[34] also exhibit expectation-dependent activity. Further, work using magnetoencephalography to achieve higher temporal resolution found that expectations did lead to a neural signal shortly before the stimulus was presented, potentially representing an expectation neural template.[35]

PREDICTIVE-CODING FRAMEWORK AND FFA, OUR FRAMEWORK, AND WAYS TO TEST IT

We hypothesize that radiological expertise can be modeled using a predictive coding framework, that is, predictions and prediction error (PE) calculations are an integral part of image-based diagnostics in clinical radiological settings, and that the predictive framework represents a neural “template” (e.g., FFA[9-11,14,15]) that serves as a signature of radiological expertise. We hypothesize that radiology experts will have lower fMRI activity when looking at imaging studies, corresponding with expectation. This may not be the case outside of their field of expertise. We propose an experiment with two fMRI probabilistic visual search and interpretation tasks: (1) A trained detection task involving assessment of mammograms containing cancer or no cancer and (2) a generic, or untrained task, involving the detection of a gray-scale target (“T” shape or no “T” shape) embedded in a group of gray-scaled distractors (“L” shapes) of variable conspicuity and camouflage – with random positions and orientations overlaid onto a fractal noise background. We operationalize our hypothesis with the following hypotheses:

The trained task will demonstrate distinct predictive codes (as opposed to the untrained task) in the radiologists’ brains before stimulus onset and take the form of spatially distributed brain templates

Ensemble activity patterns during both tasks will evolve in space and time over the course of a trial such that predictive coding gives way to stimulus coding from pre- to post-image onset

Perceptual expectation reduces the neural response amplitude in brain regions that support predictive codes, but improves stimulus representation as revealed by multivoxel pattern analysis

In both tasks, legitimate pre-stimulus predictive templates will predict the behavioral performance of radiologists in the post-stimulus period

Top-down facilitation before stimulus onset (the anticipatory processing phase) differs significantly between both tasks: The trained task has greater access to learned, memory-based information for a fast, automated, and efficient response.

To arbitrate between the possible mechanisms underlying image-based diagnostics, we will use the two aforementioned probabilistic fMRI visual search tasks. The radiology-specific “trained task” involves discrimination between mammograms containing cancer or no cancer. The non-radiological, or “untrained task,” involves the detection of grayscale “T” shapes on a textured gray fractal background, including one or more distractor “L” shapes of various orientations and levels of conspicuity.[36] Both sets of images are similar in terms of their visual content, but only the mammograms are contextually relevant to a radiologist’s training. Both tasks require systematic visual search with a high degree of top-down attentional focus. Three different auditory cues will manipulate expectations about an upcoming visual stimulus in both experiments [Table 1]. These cues will be associated with the different probabilities of receiving a cancer or non-cancer mammogram (25%, 50%, or 75% chance of receiving mammograms containing cancer; referred to as low, medium, and high, respectively; Table 1).

Table 1:: Probability of receiving a T/L image or mammogram containing cancer in tasks 1 and 2. The difficulty of detecting targets is kept static.

Audio Cue “T” is absent (T-); Cancer - Cancer +; “T” is present (T+) Cue 1: Low probability 25% 75% Cue 2: Medium probability 50% 50% Cue 3: High probability 75% 25%

The goal of the proposed experiments is to formally compare predictive coding models to stimulus content models and stimulus plus expectation models. We will combine fMRI data with model-based analyses in these tasks to quantify the evidence in all models from behavioral measurers (e.g., skin conductance [SCR]), and pupils versus the untrained task, which requires greater cognitive resources. Figure 1 outlines the fMRI probabilistic search task paradigm.

Figure 1:: Subjects will see a central fixation cross for 15s, the interstimulus-interval (ISI). An auditory cue indicating the probability of an image either with or without cancer (task 1) or with or without T (task 2) in the current trial will sound for 300 ms before the visual stimulus. The stimulus duration is 6s for either the trained (task 1) or untrained tasks (task 2). There will then be another variable delay of 3-5s. A rating screen will then appear for a duration of 3s, during which the subjects will have to report whether the last stimulus contained cancer (C+) or not (C-) or if there is a T (T+) or there is not a T (T-). Auditory cues are associated with 25%, 50%, 75% probabilities of cancer or T and will be counterbalanced across subjects. The difficulty levels of tasks 1 and 2 will be kept static. However, the ratio of positive (C+) and negative cancer (C-) or T+ and T-, will vary according to the probabilities given in Table 1. Distractors for task 2 are highlighted by red boxes and the correct target stimulus is circled in green.

Export to PPT

The first model to be tested is a pure stimulus salience-coding model in which physiological responses are a simple function of the stimulus input:

Y = wS (1)

S is the stimulus’ saliency, which is dummy-coded as 0 for cancer-negative mammograms and 1 for cancer-positive mammograms. w is a free scaling parameter. Due to dummy coding, the free parameter w describes the mean distance between the responses to mammograms containing cancer and no cancer. The distance can be determined by an arbitrary stimulus-response function since only two stimulus intensities are used here. Expectation [Figure 2; cues on the X-axis] has no effect on measured responses. The second model is the stimulus salience plus expectation model [Figure 2b], which assumes that responses to mammograms containing cancer are based on two additive effects: Expectation plus the actual stimulation due to the visual stimulus and is described by formula (2) below

y = w1S+w2P (2)

Figure 2:: Hypotheses and design. (a) The stimulus coding model is insensitive to predictive cues and sensitive to only visual stimuli. (b) Expectation may have an additive effect on brain responses in that a higher expectation of receiving a mammogram with cancer equals increased salience and increased physiological responses. (c) The predictive coding model has two components: prediction and prediction error (PE). Visual processing regions increase in activity with increasing predictions of visual stimuli. If the stimulus is a mammogram containing cancer, a PE signaling the difference between sensory input and the prediction occurs. We model the error for mammograms containing no cancer as zero. The hypothesized predictive coding response is a weighted sum of the two components. The model has two free-weight parameters; both are required to be positive. Solid lines represent equal weighting, while dashed lines represent a higher weighting for the PE.

Export to PPT

S is the stimulus’ salience, dummy-coded as in [Equation 1], and P is an expectation as determined by the probability following each of the three auditory cues (i.e., 0.25, 0.5, or 0.75). The weights w1 and w2 are free parameters controlling the weighting of inputs. Parameter w1 controls the distance between the two lines, denoting mammograms with and without cancers, and can accommodate any stimulus-response function in the current design containing two types of mammograms (due to dummy coding). The expectation for receiving a cancer-positive mammogram is assumed to have an additive linear effect on the measured response. Hence, the basic relationships between stimulus salience and response could have any form, but they would be subject to linear modulation — based on expectation. Finally, the predictive coding model states the physiological responses (fMRI parameter estimates, SCR, and pupil dilation) equal the weighted sums of prediction (P) and PE [Figure 2c]:

y = W3*P+W4*PE (3)

Prediction error [Equation 3] is defined as PE, which is the difference between the outcome and the prediction, if the outcome is viewing a cancer-positive mammogram. In the case of mammograms without cancer, PE is 0.[39]

CONCLUSION

In this paper, we reviewed the available literature on potential neural mechanisms for radiological expertise, specifically the FFA, long-term memory use during working memory tasks, and predictive coding. We proposed that the predictive coding framework localized to the FFA is a promising approach for modeling radiological expertise and provided a set of experiments to test this idea. Understanding the neural mechanisms of image-based diagnostics is, in the authors’ opinions, a worthwhile pursuit in and of itself. In terms of improving clinical care, by better understanding the underlying neural mechanism of radiological expertise, one can design more optimal training paradigms for radiological education. Metacognition, being aware of and monitoring one’s thinking, has been shown to improve learning.[37] The underlying neural mechanism and template can serve to enhance student training and provides a first step in translating into better clinical care. It is conceivable that such understanding can lead to devices that can alert physicians to any deviations from the norms in neural signals and provide feedback. An example from another field would be the Air Force Research Laboratories’ individual neural learning system.[38] Elucidating the underlying mechanism, therefore, has the ability to improve radiological education, training, and, ultimately, patient care. From the day-to-day perspective of a radiologist, one can imagine a device that would alert radiologists whenever they are more prone to errors. For trainees, the same device could be used to quantify and characterize the transformation from novice to expert. The results should be broadly applicable to other tasks that require visual detection, such as satellite image analysis and airport security.

Comments (0)

No login
gif