In the current study, four machine learning models combining elastosonographic features and clinical variables were developed to discriminate between mild and moderate-severe fibrosis in CKD patients. The XGBoost model exhibited optimal diagnostic capability, which could serve as an effective and reliable noninvasive tool for clinical decision-making relating to CKD patients. As determined by the SHAP algorithm, eGFR contributed the most to the XGBoost model. In addition, the SHAP approach was also used to visualize and interpret the diagnostic process of the XGBoost model at the individual level.
As data processing technology develops, machine learning is increasingly being introduced into the domain of medicine to support personalized clinical decisions [19, 20]. In fact, there have been several studies that applied machine learning to evaluate renal fibrosis or kidney disease status. Zhu et al. exploited a SVM model that combined the shear wave elastography value with traditional US features to differentiate the severity of tubulointerstitial fibrosis among CKD patients and obtained AUC values between 0.64 and 0.94 [21]. However, they did not compare the performance of multiple machine learning models with respect to this medical issue. A study by Li et al. constructed and compared several machine learning models based on US parameters to diagnose renal disease, yielding AUC values ranging from 0.83 to 0.91 [22]. Nevertheless, the assessment of the models’ performance in that study was inadequate, as none of the models underwent internal or external validation, so their generalizability is unknown. Last but not least, even though these studies led to progress, they only looked at how well the model performed. The model’s output, however, lacks transparency, interpretability, and a clear understanding of risk, making it difficult to implement in clinical practice [15, 23].
Four distinct machine learning models were established in this study, of which the XGBoost model achieved the optimal discrimination ability when compared to the others (SVM, LightGBM, and KNN), yielding an AUC of 0.97 (95% CI 0.94–0.99), average precision of 0.97 (95% CI 0.97–0.98) in the primary dataset, and an AUC of 0.85 (95% CI 0.73–0.98), average precision of 0.90 (95% CI 0.86–0.93) in the five-fold cross-validation cohort. XGBoost is a scalable end-to-end tree boosting algorithm proposed by Chen et al. [24], in which multi-classification and regression trees are used to learn nonlinear relationships between input variables and outcomes in a boosting ensemble manner, capturing and learning nonlinear and complex relations accurately [25]. In addition to being highly efficient, flexible, and portable, it also provides more accurate output and effectively prevents overfitting [26]. This makes the XGBoost algorithm suitable for use in critical medical research and has been successfully applied in some complex clinical situations. Shi et al. applied the US-based radiomics XGBoost model to evaluate the risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma and attained a satisfactory AUC of 0.91 and 0.90 in the training and test cohorts, respectively, which outperformed the other six machine learning classifiers and an experienced radiologist [27]. A study conducted by Zhang et al. revealed that, among the 10 constructed machine learning models, XGBoost had the superior comprehensive diagnostic performance for predicting sentinel lymph node metastasis, yielding an AUC of 0.95 in the training cohort and 0.91 in the validation cohort [28]. Consistent with the findings stated above, in the present study, XGBoost was superior to the other classifiers using machine learning algorithms in distinguishing moderate-severe fibrosis from mild forms in CKD patients, providing further evidence of the diagnostic capability and robustness of the proposed algorithm regarding clinical application.
During the progression of CKD, it is crucial to underscore the significance of adopting differentiated clinical decisions and treatment strategies tailored to the distinct stages of renal fibrosis [29, 30]. The application of the proposed machine learning model facilitates the prompt identification of CKD patients presenting with mild fibrosis, thereby enabling the avoidance of aggravating factors in the initial phases of the ailment. Consequently, this affords an opportunity for early interventions, mitigating the risk of further fibrotic progression. In instances where the machine learning model identifies CKD patients with moderate-severe fibrosis, an imperative shift towards a more proactive treatment paradigm becomes warranted. This approach is designed to prevent the onset of complications, defer the initiation of dialysis treatment, and enhance the overall quality of survival. Moreover, the deployment of the developed machine learning model facilitates a non-invasive, dynamic evaluation of renal fibrosis extent during CKD treatment or follow-up. This functionality enables judicious modifications to the treatment regimen, optimizing treatment efficacy.
Following a comprehensive set of univariate and multivariate analyses, five pivotal risk factors associated with the outcome event were identified from an initial pool of 18 potential candidate variables. These crucial factors include shear wave elastography value, renal length, renal resistive index, hypertension, and eGFR. Utilizing shear wave elastography, an advanced non-invasive imaging modality, enables the quantitative evaluation of tissue elastic properties through monitoring shear wave propagation induced by acoustic radiation force impulse excitation within a specified target. Previous studies have successfully highlighted the clinical efficacy of shear wave elastography in assessing renal fibrosis [8, 9, 31]. The progression of pathological changes within the renal system is marked by a noticeable decrease in kidney size, notably accentuated by a discernible reduction in renal length [32]. With the progression of renal pathological impairment, discernible alterations in the physical characteristics of the kidneys become apparent. These observable changes in kidney morphology serve as external indicators of evolving pathological processes affecting renal tissues. Fundamental processes contributing to CKD evolution involve alterations in renal microvascular perfusion. Elevated intrarenal resistive index, indicative of renal arteriolar sclerosis, correlates with advancing renal dysfunction and fibrosis [33]. Hypertension plays a critical role in both instigating and advancing renal capillary rarefaction, influencing the intricate vascular network of the kidneys and leading to a reduction in blood vessel density [34,35,36]. This disruption in vascular density disturbs the oxygen supply balance, exacerbating hypoxic conditions. Consequently, this sequence, initiated by hypertension, emerges as a significant driving force behind the intricate series of events contributing to CKD progression. While the precise mechanism by which hypertension triggers renal capillary rarefaction remains elusive, hypoxia-induced processes within renal capillaries, including cell atrophy and apoptosis, contribute to the progression of glomerular sclerosis, renal arteriolar sclerosis, and renal tubulointerstitial fibrosis. Within the domain of liquid biopsy indicators, eGFR emerged as a universally embraced and applied marker in medical settings for the assessment of CKD progression [16]. Nevertheless, none of the alternative liquid biopsy markers passed scrutiny in multivariate analysis. While several other liquid biopsy indices signal the onset and progression of CKD or renal fibrosis, their limitations encompass potential non-specificity to organs, exclusive association with inflammatory states or impaired organ function, and a specific inability to distinctly delineate fibrosis stages [37, 38]. Furthermore, the clinical significance of eGFR intersects with that of other liquid biopsy markers. Owing to its heightened clinical significance, eGFR assumes a robust role as a surrogate that efficaciously supplants alternative liquid biopsy indicators.
A prior study employed a multilayer perceptron classifier to evaluate renal fibrosis severity by integrating 16 clinical variables, resulting in satisfactory diagnostic accuracy [39]. As a fundamental neural network, the multilayer perceptron classifier exhibits exceptional nonlinear data processing abilities [40]. Its efficacy lies in adeptly managing a substantial volume of input variables and mapping them into a higher-dimensional feature space, autonomously assigning variable weights throughout the entire training process. With an increase in input variables, the algorithm captures more valuable information, enhancing output accuracy. However, a higher quantity of input variables necessitates more neurons for feature extraction, leading to an increase in model parameters. This expansion presents challenges to convergence, resulting in prolonged training times and potential issues such as gradient explosion. Additionally, while excelling at feature extraction from relatively large datasets, the multilayer perceptron classifier tends to overfit with smaller sample sizes, reducing its generalization performance and practical applicability. Despite its input handling advantages, careful consideration is essential due to parameter escalation and potential training challenges. Moreover, incorporating additional input variables like demographic data, laboratory indicators, and imaging parameters may improve multilayer perceptron classifier predictions but raise model application costs. The multilayer perceptron classifier built using screened independent variables in this study yielded AUCs of 0.73 (95% CI 0.64–0.83) and 0.72 (95% CI 0.54–0.89) in the training and validation sets, respectively, indicating barely satisfactory diagnostic performance in this scenario (Table S2). This investigation utilized diverse machine learning algorithms, such as XGBoost, SVM, KNN, and LightGBM, to tackle the clinical issue. The modeling parameters of these classifiers prove relatively straightforward and comprehensible. Not only do they demand a minimal set of variables for constructing models that achieve decent predictive accuracy, but they also exhibit efficiency and adaptability in practical use. XGBoost classifier is esteemed for its ensemble learning capability and remarkable performance, delivering reliable predictions even in sub-optimal feature engineering scenarios [41]. The SVM classifier excels at handling nonlinear and high-dimensional data, exhibiting superior classification accuracy for small-scale datasets [42]. The KNN classifier, known for its simplicity and intuitive nature, operates without assumptions about data distribution, proving versatile across various data types while effectively managing nonlinear data [43]. The LightGBM classifier is preferred as a gradient enhancement framework due to its efficient training speed [44]. The lightweight design of these algorithms and their minimal variable requirements significantly contribute to faster training and reduced computational costs in practical applications. This aspect holds particular significance in clinical settings characterized by limited computational resources or real-time processing
It should be noted that when using a machine learning algorithm to solve a crucial clinical problem, the “black box” problem of the model should be brought into the spotlight and addressed [14]. This means that the model’s decision-making process should be transparent and explainable instead of solely obtaining more accurate results. In this case, a SHAP strategy was introduced to demonstrate the importance and impact of features on the XGBoost model’s output and provide individual patients with a visual interpretation of their diagnostic results. As illustrated in the SHAP plot, the variable having the greatest impact on model output was eGFR, with lower eGFR values corresponding to higher Shapley values, driving an increased chance of model output being moderate-severe renal fibrosis. This finding of the SHAP algorithm was in line with what was seen in clinical practice, as a decline in kidney function was a warning sign that renal fibrosis would be exacerbated in CKD patients [45, 46]. Additionally, the SHAP algorithm revealed that, as the feature contributing the second highest amount to model output, a higher shear wave elastography value corresponding to a lower Shapley value reduced the likelihood of developing moderate-severe renal fibrosis, which was consistent with previous research [8, 9, 47]. Consequently, SHAP addresses the “black box” issue that has hindered the development of complex models by providing a personalized and reasonable explanation for diagnosis, significantly improving the application value of clinical models and clinicians’ confidence in established models.
Despite several strengths of this study, there are still some aspects worth noting. First, previous studies have identified age as an independent risk factor in renal fibrosis progression [48, 49], which aligns with the findings from the univariate analysis conducted in this study. However, the multivariate analysis did not include age as an independent variable. Taking into account the pathophysiological impact of age on shear wave elastography-measured elasticity, eGFR, renal length, and hypertension, their simultaneous incorporation into the multivariate analysis might have led to overlapping and intertwining information [50,51,52]. While the multivariate analysis retained shear wave elastography value, eGFR, renal length, and hypertension—each impacted by age—, it chose to exclude age itself as an independent variable. This exclusion could be attributed to these variables already capturing the diagnostic significance associated with age, thereby rendering a separate consideration of the age variable unnecessary. Second, elastography in assessing renal fibrosis remains controversial in clinical practice. Studies by Leong et al. and Yang et al. revealed an increase in shear wave elastography-measured renal stiffness corresponding to the progression of chronic renal damage characterized by glomerular sclerosis, interstitial fibrosis, and tubular atrophy [53, 54]. In contrast, our previous investigation revealed a decrease in shear wave elastography-derived elastic values as pathological damage progressed in renal fibrosis [9]. Another study conducted by Güven et al. utilizing magnetic resonance elastography to assess renal fibrosis also concluded that magnetic resonance elastography-derived stiffness values decreased in patients with chronic injury, specifically noting reduced stiffness as glomerulosclerosis and tubulointerstitial fibrosis progressed [55]. It is important to emphasize that previous studies have exhibited deficiencies in the way they have conducted their experiments, resulting in conclusions that differ from those reached by our study and that of Güven et al. For example, Leong et al.’s study utilized point-shear wave elastography for detecting renal fibrosis, lacking an elastogram during image acquisition, which hindered artifact-free region identification. Furthermore, point- shear wave elastography employed a fixed size for the region of interest, potentially leading to inaccuracies in placement and increased measurement variability by not excluding the renal medulla. In Yang et al.’s study, shear wave elastography values were obtained from the kidney’s inferior pole. Conversely, Lin et al.’s research highlighted notably lower variability coefficients in the mid-region compared to the lower pole, suggesting constrained reproducibility in measurements taken from the renal poles [31]. In order to improve reproducibility, it is recommended to refrain from measuring renal poles [56]. Another study by Leong et al. emphasized the importance of these factors on shear wave elastography assessment in renal fibrosis, suggesting that they could lead to inaccurate results and, therefore, erroneous conclusions [57]. Third, input variables, such as shear wave elastography value, renal resistive index, and hypertension, collectively indicate the influence of renal perfusion to some extent and could potentially introduce biases. Machine learning algorithms do not exclusively focus on direct associations between these variables. Instead, they are trained to manage multivariate feature coupling, aiming at precise predictions [58]. These algorithms process data by emphasizing collective effects among features, rather than concentrating solely on simple relationships. By conducting comprehensive analyses and processing multiple features, these algorithms adeptly capture and leverage intricate interactions between features to enhance predictive capabilities. Their primary objective is to refine prediction accuracy by thoroughly considering the complexity of multiple features, thereby offering a more precise understanding of data patterns and trends.
This study has some limitations. First, as the number of patients enrolled in the present study is still relatively small, future studies with a large population-based cohort, which allows more detailed analyses, are warranted. Second, considering that the current study is derived from a single center cohort, further large-scale, multicenter studies are required to validate the present findings.
Comments (0)