Evaluation of the effectiveness of artificial intelligence models in radiopaque and radiolucent lesions of the maxillofacial region on panoramic radiographs

The maxillofacial region houses various tissues, such as teeth, bone tissue, cartilage tissue, nerves, and vascular structures, making it an anatomical area where pathological formations originating from these tissues are frequently observed. Due to the diversity and origins of these pathological formations from different tissues, various classifications have been made based on their etiology, clinical appearance, histopathological features, and radiographic images. The diversity of cysts, tumors, and tumor-like lesions occurring in the maxillofacial region, along with their similar clinical and radiological characteristics, makes diagnosis, and classification particularly challenging for clinicians [24].

With the use of X-rays in dental radiology, the diagnosis and identification of lesions have become easier through radiological evaluations in addition to clinical examinations. The first method commonly used in radiological assessment is direct radiography, and PRs are the most frequently preferred, enabling the simultaneous observation of teeth, supporting tissues, and most structures in the facial region. However, due to disadvantages such as the inability to visualize lesions in different planes, magnifications, geometric distortions, superimpositions, and low resolution, advanced imaging methods are often required. The techniques such as computed tomography, magnetic resonance imaging, and ultrasonography, however, are costly and may not be available in every facility. Additionally, interpreting the obtained images requires specialist physicians and experience. Therefore, the application of AI in clinical and radiological practice for the automatic diagnosis of lesions holds significant value [25]. In this study, the performance of AI models, which can learn from larger datasets, in detecting lesions in the maxillofacial region was investigated using PRs, which are frequently utilized in clinical practice.

In the literature, studies on the use of AI in dental radiology show that the image materials forming the dataset typically consist of PRs and, less frequently, cone-beam computed tomography (CBCT) images [11]. The use of PRs, which are common in radiological diagnosis, provides an advantage for AI techniques to better learn lesion characteristics. This can enable the early detection of lesions and reduce the need for advanced imaging.

Although radiological images include numerous anatomical structures that need to be scanned and evaluated, AI applications can improve the accuracy of diagnosing complex pathological formations [26]. In a study by Yang et al. [12], a DL method was developed that can detect and classify lesions in four different categories observed in the maxilla and mandible using PRs. This study is the first known to include datasets from both the maxilla and mandible. In this study, the YOLOv8 architecture and a dataset comprising specific histopathologically confirmed lesions were used as the method. In our study, in addition to the entirety of the maxilla and mandible, a broader area of the facial region on PRs was also included in the dataset. The lesions targeted for detection were evaluated based on the densities they produced in radiological images rather than their histopathological characteristics. Despite including a smaller dataset, the accuracy rate of object detection in general classification using the YOLOv8 architecture in our study was found to be higher compared to the study by Yang et al [12]. We believe this is due to the examination of a broader anatomical area and the increased diversity achieved by labeling across more regions.

In another study with a dataset consisting of 226 PRs [13], the performance of the YOLOv8 DL architecture in detecting and segmenting radiolucent lesions in the mandible was investigated. Using the YOLOv8l sub-model, results without data augmentation showed mAP50 values of 75.8% for object detection and 75.1% for segmentation. In the same study, experiments conducted with data augmentation achieved success rates of 97.5% and 96.6% for object detection and segmentation, respectively. When compared to the proposed study, in our study, using the YOLOv8m architecture, which yielded the highest performance in the experiments, results obtained without increasing the data size showed values of 75.4% for object detection and 78.4% for segmentation mask prediction exclusively for radiolucent lesions. We believe that the relatively higher success rates obtained from our study compared to this study are due to the inclusion of datasets from both jaws and a larger number of data samples.

In a study conducted by Ariji et al. [14], five different radiolucent lesions located exclusively in the mandible, which are relatively more common, were detected and classified on PRs using DL methods. For this purpose, a dataset consisting of 210 PRs was utilized. The highest success rate in object detection was achieved for dentigerous cysts at 88%, while the success rate for classification was 82%. In the aforementioned study, radiolucent and radiopaque lesions were analyzed using AI models solely based on their radiological density, without differentiation between jaws or histological classification. Ultimately, a sensitivity of 98.8% was observed in binary classification. In our study, a more fundamental approach was taken to the problem by evaluating the presence or absence of lesions across the entirety of PRs, and the method we used distinguished 470 out of 500 lesions, achieving an accuracy rate of 95.6%. We believe that the higher success rates in object detection and classification in our study compared to those of Ariji et al. [14] are due to the inclusion of both jaws and a larger dataset in the study.

The failure in detecting and classifying lesions is often due to lesions that are small in size, have poorly defined boundaries, and produce weak radiological images. Particularly, lesions that are challenging to diagnose even for experienced clinicians, due to unclear early-stage pathologies and the superimposition of surrounding anatomical structures, can pose significant challenges for the AI algorithms used. For this reason, in this study, lesion classification was conducted both across the entirety of PRs and by cropping the lesion areas, revealing notable differences between the two approaches. The highest success rate for the classification of radiopaque and radiolucent lesions across the entirety of PRs was achieved with the VGG16 model at 68.4%, whereas for cropped images, this rate significantly increased, reaching up to 97.7%. To achieve higher success rates and ensure the models are less affected by pattern differences on PRs, it can be stated that increasing the number and diversity of data used in the classification performed on the entirety of PRs is necessary.

A review of the literature reveals that there are relatively few studies focusing on lesion classification on the entirety of PRs using the VGG16 model. In a study aimed at diagnosing jaw tumors, lesion classification was performed on a dataset consisting of 500 PRs, including 250 ameloblastomas and 250 odontogenic keratocysts, and the sensitivity and accuracy rates of the model in the study were reported as 81.8% and 83.1%, respectively [15]. In our study, which performed classification on the entirety of PRs and compared it with this literature, we observed that the diversity and localization of lesions, as well as the inconsistency in the number of data points based on lesion types, influenced our results.

In another classification study using the VGG16 model, a dataset consisting of PRs from 115 patients with nasopalatine canal cysts and 230 control group images without cysts was used, and the accuracy rate of the model in classification was reported as 88.4%. In the same study, the success rate of classification using the LeNet model was found to be 85.2% [27]. In our study, which classified radiopaque and radiolucent lesions on the entirety of PRs, the sensitivity and accuracy rates obtained using the VGG16 model were 98.8 and 68.4%, respectively. When classification was performed by cropping the lesion areas using the same model, these rates were found to be an average of 93 and 94%, respectively. In our study, which performed classification on the entirety of PRs and compared it with Ito et al. [27], we observed that the differences in the diversity and localization of lesions, as well as the imbalance in the number of data points according to lesion types, influenced our results.

In our study, the classification problem was approached with a variation by analyzing the PRs as a whole and by cropping the lesion-containing region from the same dataset. Additionally, these approaches were subjected to experiments under different scenarios as binary, three-class, and four-class classifications. During this process, CNN-based architectures, AlexNet, VGG16, and GoogleNet, were fine-tuned with our dataset. While the overall success in classification problems was achieved with the GoogleNet architecture, it was observed that the models produced more successful results when classifying cropped images compared to the entirety of PRs. In the literature, there is no study comparing these two different approaches together; however, based on the results of our study, we attribute the lower success rate of classification performed on the entirety of PRs to the size of the examined area, which is influenced by the complexity of surrounding anatomical structures.

In one study conducted on a dataset consisting of 412 PRs, only radiolucent lesions were detected with an accuracy rate of 75–77%. The lower success rate of the results obtained from this study compared to the detection of radiolucent lesions in the mandible was attributed to the low contrast between maxillary lesions and surrounding structures, as well as the superimposition of lesions by anatomical structures such as the maxillary sinus, nasal cavity, hard palate, inferior turbinate, and others, which makes them more challenging to interpret [16]. Additionally, a study reported that in some images, the extension of the maxillary sinus to the alveolar crest level could lead AI to mistakenly identify it as a lesion [28]. In line with the results of these studies, we observed in our study that superpositions caused by anatomical structures in PRs can affect the success of lesion detection. In the upper jaw, the low density exhibited by the maxillary sinus with septa formations, the nasal fossa, and airway arches that are particularly challenging for clinicians to distinguish has made the detection of radiolucent lesions more difficult, while the high density exhibited by the hard palate line, the hyoid bone superimposed on the mandible, vertebrae, and ghost images has complicated the detection of radiopaque lesions. This also affects the success of detecting not only radiolucent lesions but also radiopaque lesions in PRs. Therefore, in our study, without distinguishing between the maxilla and mandible, an average success rate of approximately 80% was achieved for detecting radiolucent lesions and approximately 70% for detecting radiopaque lesions, resulting in an overall average success rate of approximately 75%.

Yesiltepe et al. [29] in a study conducted using the GoogleNet architecture, aimed to detect lesions on 493 PRs with idiopathic osteosclerosis using the GoogleNet-Inceptionv2 model, achieving sensitivity, precision, and F1 score values of 0.88, 0.83, and 0.86, respectively. Radiopaque lesions other than idiopathic osteosclerosis were excluded from the study. In our study, however, all radiopaque lesions observed in the maxillofacial region were included in the classification, object detection, and segmentation problems, and similarly, the GoogleNet architecture was utilized for classification. It was determined that the proportional performance of the GoogleNet model for the classification of radiopaque–radiolucent lesions on the entirety of PRs showed sensitivity of 82.0%, precision of 70.5%, and an F1 score of 75.8%. In our study, in which we obtained results comparable to Yesiltepe et al. [29], we believe that the success is attributable to the performance of the model, despite the fact that the lesions comprising the dataset we used are not histopathologically uniform or single-density lesions, and the number of data points is not balanced according to the types of lesions.

In a study by Lee et al. [30], they utilized both PR and CBCT images as a dataset and evaluated the diagnosis and detection of odontogenic keratocysts, dentigerous cysts, and periapical cysts using a pre-trained deep CNN architecture derived from the GoogleNet-Inceptionv3 model. As a result of their research, the accuracy of the model was found to be 91.4% on CBCT images, while it was 84.6% on PRs. They attributed this high success rate on CBCT to the nature of the CBCT method, which allows for sectional examination in different planes without superposition.

Abdolali et al. [31] developed a method based on asymmetry analysis in the segmentation of lesions using a data set consisting of CBCT images of 97 patients with cystic lesions. As a result of their studies, they reported that due to factors such as the varying positions and sizes of cysts and their presentation with densities similar to surrounding tissues, traditional segmentation algorithms could produce a large number of false-positive pixels and exhibit poor segmentation performance. In our study, not only cystic images but also the segmentation of numerous lesions with varying densities were performed on PRs. Despite the large number of pixels in the images, which complicated the proper design of neural network models, a success rate of 72.1% was achieved with a limited dataset.

The three-dimensional volumetric calculation of lesions in the maxillofacial region offers many advantages during the evaluation process of lesions [32]. Accurate segmentation allows for the precise estimation of the localization and size of lesions [33]. In our study, which investigated with AI the localization of regions containing radiopaque and radiolucent lesions on PRs using bounding boxes and, more specifically, identifying the corresponding pixels in these areas, the YOLOv8 architecture capable of real-time detection was utilized. Ultimately, we achieved a success rate of 71.5% in lesion detection by drawing bounding boxes around objects and 72.1% in segmentation corresponding to pixel-level classification. Based on these results, we believe that segmentation processes performed with AI methods can facilitate volumetric detection of the lesioned area before surgical procedures and provide valuable insights for future studies in this field.

While radicular cysts and dentigerous cysts are among the most commonly observed lesions in the maxillofacial region, the occurrence of other pathological formations is less frequent, and this uneven distribution poses a challenge to the success of data sets obtained for studies [34]. Similarly, in our study, the distribution of radiopaque and radiolucent lesions in PRs obtained from archival materials was not equal. Lymph node calcifications and sialoliths, which produce radiopaque images and are rarely observed lesions, were also included in the study. These types of lesions are few in number and exhibit various localization patterns. Therefore, based on the results of our study, it can be stated that AI models achieved higher success in classifying, detecting, and segmenting radiolucent lesions compared to radiopaque lesions.

There are studies in the literature demonstrating that the high accuracy performance of AI algorithms in diagnosis and detection is equivalent to that of expert physicians [15, 35, 36]. However, these studies generally focus on the detection of specific types of lesions that clinicians do not find challenging to distinguish or address other issues unrelated to lesions, such as tooth numbering. In our study, however, we conducted research on a wide range of lesions that clinicians might find difficult to diagnose and that are not located in specific or standard positions. Additionally, the inclusion of both radiopaque and radiolucent lesions in the maxillofacial region highlights the comprehensive aspect of the study. By increasing the amount of data, we aim to develop more generalizable AI models. Thus, we believe that the automatic detection and classification of lesions and their characteristics in the maxillofacial region could become feasible in clinical practice.

Despite the promising results obtained in this study, several limitations inherent to the AI models and the study design must be acknowledged. First, the dataset, although larger than many previous studies, remains limited in terms of lesion diversity and distribution, particularly for rare radiopaque lesions such as sialoliths or idiopathic osteosclerosis. This imbalance may have introduced bias in model performance, favoring the detection and classification of more prevalent lesion types. Second, the inherent limitations of PRs, such as anatomical superimposition, geometric distortion, and lower resolution compared to advanced imaging modalities, pose significant challenges for AI models. These factors can lead to decreased performance, especially in detecting small or poorly defined lesions, or those located in anatomically complex regions such as the maxillary sinus or anterior nasal spine. Furthermore, the model-specific constraints were observed. While GoogleNet achieved the highest overall classification performance, its complex architecture may lead to increased computational demands and potential overfitting, particularly in smaller or less diverse datasets. VGG16, although demonstrating strong sensitivity for certain lesion types, showed lower accuracy in multi-class classification tasks, likely due to its limited capacity for hierarchical feature extraction. YOLOv8, despite its efficiency in real-time object detection, exhibited challenges in accurately detecting small or low-contrast lesions, suggesting limitations in its anchor box design and feature pyramid representation.

The experimental results and Grad-CAM analysis of all data showed that radiopaque and radiolucent lesions were not confused with each other. We can attribute the model’s erroneous results to the insufficient number of data to recognize the problem. Especially considering the wide variation in shape, pattern and location variation within the class, this study is considered to have promising initial performance. Moreover, we are hopeful that by increasing the number of data for both classes, we will be able to train models suitable for practical application in clinics.

Another critical limitation lies in the generalizability of the models. All experiments were conducted on a single-institution dataset acquired using a specific panoramic radiograph system, which may limit external applicability. The lack of external validation on datasets from different populations or imaging devices constrains the models’ generalizability across diverse clinical settings. Finally, while AI models demonstrate strong potential in automating lesion detection and classification, their integration into routine dental workflows requires further investigation, including prospective clinical validation, seamless software integration, and consideration of medico-legal responsibilities. These limitations underscore the need for continued research with larger, more diverse datasets, multi-institutional collaborations, and the development of explainable AI frameworks that can provide interpretable outputs to support clinical decision-making.

In conclusion, in his study, the presence of lesions in the maxillofacial region on PRs and their differentiation based on densities of image-producing lesions were evaluated using CNN-based DL algorithms AlexNet, VGG16, and GoogleNet, aiming to determine these models’ classification success. Additionally, the performance of the CNN-based YOLOv8 model in the detection and segmentation of radiopaque and radiolucent lesions was investigated. According to the study results, the ability of AI algorithms to identify lesions on PRs and classify them based on their densities is promising. This study aims to integrate AI models into hospital systems, enabling the effective application of state-of-the-art DL approaches in the clinical field and assisting dentists in lesion diagnosis.

The originality of this work stems from the problem itself and the comprehensive analysis applied to it, rather than from architectural innovation. A substantial effort was dedicated to the collection, curation, and meticulous labeling of the dataset, which required considerable domain expertise and time. Given the resource-intensive nature of this process, we made a deliberate decision to focus on fine-tuning well-established, high-performing architectures rather than developing new ones, to ensure that our results would be robust, reproducible, and directly comparable to the state-of-the-art.

Comments (0)

No login
gif