The study protocol was approved by the Institutional Review Board (IRB) with a waiver of consent, and all patient images were de-identified according to Health Insurance Portability and Accountability Act (HIPAA) rules. The initial inclusion criteria consisted of CT scans of adult patients who underwent IVC filter placement during a 10-year period from January 1st, 2009, to January 1st, 2019. Data were extracted from the picture and archiving communication system (PACS) of the Ohio State University Wexner Medical Center. A total of 2048 CT scans were obtained from 439 patients. Among them, 399 patients had retrievable filters, and 40 had non-retrievable filters (c.f., Table 1). Filter types in the development dataset included ALN (retrievable), Celect (retrievable), Denali (retrievable), G2 (retrievable), OptEase (retrievable), Gunther-Tulip (retrievable), Option (retrievable), TrapEase (non-retrievable), and Greenfield (non-retrievable) (c.f., Fig. 3).
Reference StandardsWe employed a supervised training strategy with a training set containing annotations that include the IVC filter type and their locations based on the 3D coordinates. The ground truth annotations of the filter types were determined based on the information available on the patient’s electronic medical record. The reference annotations of the filter locations were manually marked as rectangular bounding boxes on CT scans through an interface. We assumed the center of the bounding box being the center of the IVC filter object.
Table 1 Characteristics of the study subjects Fig. 3Rendered filter volumes. Filter types: a ALN, b Celect, c Denali, d G2, e OptEase, f Gunther-Tulip, g Option, h TrapEase (non-retrievable), i Greenfield (non-retrievable)
Detecting Metallic CandidatesThe initial step of the framework is to locate the metallic candidates on abdominal CT scans. The inferior vena cava filter is a small metallic device. Given an abdomen CT scan, this stage of the algorithm returns a list of locations of the metallic regions based on the Hounsfield unit (HU), a measure of density on CT scans. The HU scale varies according to the density of the body part, such that below \(-1000\) is considered air, \(-500\) for lung, \(+700\) to \(+3000\) for bone, and over \(+2000\) for metals. Instead of thresholding every slice in a CT with the Hounsfield scale, we first compute the CT scan’s maximum intensity projection (MIP) image. MIP is a computational process that projects the highest attenuation value along with one of the axis onto a two-dimensional (2D) image plane. We project the maximum intensity value of sequential abdomen scans on the axial plane to compute (x, y) coordinate of metallic object candidates on the coronal plane to calculate the corresponding z coordinate.
With \(I^\) representing the MIP image of a CT scan, and \(I^(x,y)\) representing the HU value at spatial location (x, y), we threshold the attenuation values of \(\forall x \in I\) and \(\forall y \in I\) in CT scans as follows:
$$\begin I^(x,y) = 0, & \text \ I^(x,y)<+2000 \\ I^(x,y), & \text \ I^(x,y)>=+2000 \\ \end\right. } \end$$
(1)
The thresholding of the MIP image on the axial plane of a CT scan provides a list of (x, y) coordinates of metallic objects. The thresholding of the MIP image on the coronal plane provides the corresponding z coordinates of the metallic candidates. To assess the location of the candidates, we apply connected component labeling. Connected component labeling analyzes an image and groups the pixels based on the pixel neighborhood. Let \(p \in I^\) and \(q \in I^\) represent pixels. S is a connected component if there is a connected path from \(\forall p \in S\) to \(\forall q \in S\). We have used 8-connectivity to assign the labeling. Each connected component S is a cropped volume with a center coordinates (x, y, z). The later stage of the pipeline is classifying the candidates containing an IVC filter vs. not containing an IVC filter and locating the filter candidates with a confidence value. The candidates S, which contain a filter, are further processed for filter type determination.
Data ProcessingWe applied pre-processing techniques before feeding the data to the data-driven models for filter vs. no filter classification. This stage aims to prepare the data to extract better representative discriminative features for each class and train the data-driven model more efficiently. We applied the following processing techniques:
Data normalization: This is the process of rescaling the intensity values to a range so that each training sample has a similar data distribution. We have applied Z-score normalization by subtracting the mean of the data from each instance and dividing the result by the standard deviation. The data normalization boosts the training performance and helps the classifier converge faster.
Data augmentation: Data-driven methods require large amounts of data to train a model. If the data is limited, the data-driven model may suffer from overfitting, which results in poor generalizability. We have applied data augmentation that increases the size of the data and the variability in appearance [19]. We randomly shifted CT scans along x, y, and z axes with [-3, 3] pixels and rotated randomly between 2 and 5°.
Detecting Filter LocationThe previous step of the framework provides metallic object candidates S, which have higher attenuation values than the predetermined threshold for the metallic objects. Several locations may contain metallic objects in the abdominal CT scan, such as surgical clips, intravascular stents, spinal fusion hardware, retained shrapnel, or swallowed foreign bodies (c.f., Fig. 4). However, the appearance of these candidates is different from the appearance of an IVC filter. We have utilized data-driven models which process candidate regions, eliminate non-filter metallic objects based on appearance features, and predict the IVC filter location. We have developed two data-driven models for this task. First, a two-dimensional (2D) convolutional neural network (CNN) was trained to refine the candidates. Later, we have trained a recurrent CNN to use the spatial knowledge between the sequential slices. We formulate the filter refining stage as a binary classification problem. We aim to find a function \(y=f(S)\) where the input is the metallic volume S extracted from CT scans at Section “Detecting Metallic Candidates”, and the output is a binary label \(y \in \) indicating the region of interest contains an IVC filter or not.
Given a set of metallic candidate locations with corresponding true label, we build a training dataset \(R = \_^N\) where \(S_i\) is a candidate metallic volume at coordinate \(\), \(y_i\) is the true label and indicating the region of interest \(S_i\) contains an IVC filter or not, and N is the number of training samples. The networks are trained with Adam optimizer [22] by minimizing the binary cross entropy
$$\begin H_p(q) = -\frac\sum _^y_i\log (p(y_i))+(1-y_i)\log (1-p(y_i)) \end$$
(2)
where \(p(y_i)\) is the predicted probability that the \(S_i\) contains filter.
Fig. 4Example metallic objects in the abdominal CT (indicated in red bounding box), which have higher attenuation value than the predetermined threshold attenuation value for metallic objects. Several metallic objects can be present in the abdominal CT scan, such as surgical clips, intravascular stents, spinal fusion hardware, retained shrapnel, or swallowed foreign-bodies
We have used data-driven models to process the metallic volumes S and predict those with IVC filters. The data-driven models are based on CNN, which hierarchically extracts representative features by processing image sequences with kernel filters [18, 23]. The strengths of CNNs are their ability to learn the internal representation of images and preserve the local connectivity that allows the architecture to learn the spatial pattern. To our knowledge, CNNs have not been used to analyze CT scans for IVC filter detection and filter type classification. We expect the system to learn the filter’s location and appearance during the model training stage.
We first developed a 2D-CNN to classify the candidate volumes S for IVC filter detection. We used the VGG-16 [24] as a backbone architecture, removed its fully connected layers, and inserted a fully connected layer, a dropout layer with a parameter of 0.5, and an output layer with two nodes representing the probability of S containing an IVC filter. Although we have been working with a relatively large dataset, the number of scans is still limited for training a model from scratch (c.f., Table 1). Therefore, we utilized the fine-tuning training strategy in which the machine learning model initialized with weights pre-trained on another dataset. For the study, the weights are pre-trained on ImageNet [23]. To feed the 2D CNN architecture with volume S, we computed the maximum intensity projection (MIP) image of the region of interest S. Figures 5 and 6 show MIP images of example region of interest \(S_i\) which contain a filter and MIP images of example region of interests \(S_i\) which have higher attenuation values but do not contain a filter, respectively.
Fig. 5MIP images of region of interests \(S_i\) which contain a filter
Fig. 6MIP images of region of interests \(S_i\) which have higher attenuation values but do not contain a filter
Analyzing the volumetric data with a 2D-CNN using MIP images may lead to loss of volumetric information, which is the temporal relationship between the slices. To incorporate the temporal knowledge in the training process, we have developed a recurrent CNN architecture which is the combination of recurrent NN and CNN. The recurrent neural networks (RNN) have the ability to process and learn features from sequential data. The combination of CNN and RNN builds a hybrid model that captures both temporal and spatial features in the data. The abdominal CT scans can be processed as sequential data. The change throughout the scan is the temporal behavior. Therefore, in our setting, the temporal dimension is substituted with the third dimension (z-axis). We built a recurrent CNN model and processed the volume S for filter localization and filter type classification. The backbone CNN architecture of the recurrent CNN is the VGG-16 [24]. The model also utilized the fine-tuning strategy to initialize the weights of VGG-16. We extract the imaging features of each scan through the CNN that learns the spatial knowledge. We then process the extracted features through an RNN, and that learns the temporal behavior of S. The RNN consists of two layers of 16 and 8 gated recurrent units (GRUs) with a dropout layer between GRU layers with a parameter of 0.5 and an output layer with two nodes representing the probability of S containing an IVC filter.
Detecting Filter Location—Post-ProcessingThe filter detection component of the system predicts the location of the IVC filter and locates it with a bounding box (BB) with an associated confidence value. The softmax probability of the models is considered model confidence for its decision for the assigned label. One of the issues in the object detection systems is overlapping bounding boxes that refer to the same object. To remove the repetitive bounding boxes, we have applied Non-Maximum Suppression which selects the best bounding box for objects by taking into account the confidence value predicted by the model and overlapping score of bounding boxes.
Classifying Filter TypeThis stage of the system processes the predicted filter locations and outputs the probability of filter types as retrievable vs. non-retrievable. Accurate IVC filter type identification is a challenging object recognition problem due the minor variations in appearance between the filter types (c.f., Fig. 3). Additionally, the number of instances for some subtypes of the filters is not enough to train a robust classifier (c.f., Table 1). Therefore, instead of applying a multi-class classification, we train our model to separate the filters into two classes as retrievable vs. non-retrievable cases. We utilize the same architectures developed in filter localization stage, 2D CNN, and recurrent CNN, which are defined in Section “Detecting Filter Location—Post-Processing”. We trained the models with retrievable and non-retrievable filters. Therefore, the models learn the morphological differences of filter types and determine the subtype.
Comments (0)