Objective:
This study aimed to develop and validate a multidimensional clinical feature-based machine learning model for accurately predicting the risk of early neurological deterioration (END) in patients with acute ischemic stroke (AIS).
Methods:
A total of 338 AIS patients were randomly divided into a training set (n = 236) and a validation set (n = 102). Five core predictors were identified from multiple clinical and pathological indicators: admission National Institutes of Health Stroke Scale (NIHSS) score, admission blood glucose, infarct core volume, collateral circulation status, and neutrophil-to-lymphocyte ratio (NLR). In the training set, univariate analysis was first performed to screen prognosis-related factors. After variable compression via least absolute shrinkage and selection operator (LASSO) regression, multivariate logistic regression was employed to determine independent risk factors for poor prognosis. Using Python, three prediction models-Random Forest (RF), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN)-were constructed. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), and the optimal model was selected.
Results:
No statistically significant differences were observed in baseline characteristics between the training and validation sets (P > 0.05). Multivariate logistic regression revealed that admission NIHSS score, blood glucose, infarct core volume, and NLR were independent risk factors (P < 0.05), while collateral circulation status was an independent protective factor (P < 0.05). The RF model demonstrated superior predictive performance, with AUC values of 0.779 (training set) and 0.775 (validation set), significantly outperforming KNN (0.727, 0.741) and GBM (0.736, 0.665).
Conclusion:
The multidimensional model provides a potential practical tool for early clinical identification of high-risk END patients and timely intervention.
IntroductionAcute ischemic stroke (AIS) is one of the leading causes of disability and mortality worldwide, imposing a substantial burden on society and families (1). Despite landmark advances in hyperacute revascularization therapies, the post-stroke disease course remains highly variable, with a significant proportion of patients at risk of clinical progression (2). Among these challenges, early neurological deterioration (END) represents a common yet formidable clinical complication following AIS (3). END typically refers to unexpected worsening or fluctuation in neurological deficits within the initial hours to days after stroke onset (4). Although its precise definition varies slightly across studies, a commonly adopted criterion is an increase of ≥ 2 or ≥ 4 points on the National Institutes of Health Stroke Scale (NIHSS). Extensive literature indicates that END occurs in 10–40% of patients and is strongly associated with poor long-term functional outcomes, prolonged hospitalization, and elevated mortality (5).
The pathophysiological mechanisms underlying END are complex and multifactorial, involving several key aspects: First, ischemic progression serves as the central mechanism, wherein thrombus extension or new emboli lead to the conversion of the ischemic penumbra into the infarct core, directly driving neurological decline (6). Second, secondary brain injury processes, such as excessive activation of post-ischemic inflammation, excitatory amino acid toxicity, and massive free radical production, collectively exacerbate blood-brain barrier disruption and neuronal death (7). Additionally, systemic factors, including fever, infection, dysglycemia, and hemodynamic instability, may contribute to END by reducing cerebral perfusion or increasing metabolic demand (8).
Currently, clinical practice lacks a unified and efficient tool for END prediction. Traditional approaches predominantly rely on physicians’ empirical judgment or isolated predictors, such as higher baseline NIHSS scores, admission hyperglycemia, or large vessel occlusion on imaging (9, 10). However, these individual indicators exhibit limited predictive performance, often failing to balance specificity and sensitivity, and inadequately capture the multifactorial nature of END. Therefore, there is an urgent need for a predictive model capable of integrating multidimensional data to quantify individual risk, enabling early identification of high-risk patients.
Advances in medical informatics have provided novel solutions to such complex challenges through machine learning (ML). ML algorithms can autonomously learn intricate nonlinear relationships and interactions from high-dimensional data, uncovering patterns beyond conventional statistical models (11). In stroke research, ML has been successfully applied to diagnostic classification, outcome prediction, and imaging analysis (12). A multidimensional ML model incorporating clinical assessments, neuroimaging, and serum biomarkers may theoretically provide a more comprehensive representation of END pathophysiology, thereby enabling more precise risk stratification.
Against this background, this study aimed to retrospectively collect clinical data from AIS patients, identify core predictors closely associated with END, and develop a robust predictive model using advanced ML algorithms. It is anticipated that this model will serve as an early and convenient decision-support tool for clinicians, facilitating timely intervention for high-risk patients and ultimately improving outcomes.
Materials and methodsStudy populationThis single-center, retrospective observational study consecutively screened AIS patients admitted to the Department of Neurology between June 2022 and June 2025. Inclusion criteria were: (1) age ≥ 18 years; (2) time from onset to admission < 24 h (13); (3) completion of baseline magnetic resonance imaging (including DWI and PWI sequences) and relevant laboratory tests upon admission. Exclusion criteria included: (1) receipt of intravenous thrombolysis or endovascular thrombectomy (to avoid reperfusion-related confounding); (2) death or discharge within 24 h of admission; (3) severe organ dysfunction or active infection; (4) incomplete clinical or imaging data. Ultimately, 338 patients were included in the analysis.
Data collectionData were collected using a predefined standardized case report form. All variables were independently extracted by two researchers from electronic medical records and cross-checked for accuracy. Demographic data (age, gender) and medical history (hypertension, diabetes, atrial fibrillation, prior stroke, or transient ischemic attack [TIA]) were recorded. Admission clinical features included NIHSS score (baseline), Glasgow Coma Scale (GCS) score, first-measured systolic blood pressure, and point-of-care glucose. Imaging data were analyzed by two blinded neuroradiologists unaware of clinical outcomes. MRI was performed using a 3.0T scanner (Siemens Skyra) with the following sequences: diffusion-weighted imaging (DWI, b-values 0 and 1,000 s/mm2, slice thickness 5 mm), perfusion-weighted imaging (PWI, dynamic susceptibility contrast, 20 time-points), and time-of-flight MR angiography. Infarct core volume was measured as the volume of the restricted diffusion area on DWI, calculated using semi-automated segmentation software (Olea Sphere, v3.0). Hypoperfusion volume was defined as the area with prolonged mean transit time (>145% of contralateral hemisphere) on PWI. Collateral circulation status was assessed on MR angiography using the modified Tan scale, with good collateral defined as collateral supply filling > 50% of the distal middle cerebral artery territory. Laboratory data consisted of the first venous blood test results post-admission. The neutrophil-to-lymphocyte ratio (NLR) was calculated by dividing absolute neutrophil count by absolute lymphocyte count from complete blood cell analysis. Inflammatory markers, such as C-reactive protein (CRP), were also documented.
Outcome definitionThe primary outcome of this study was END, operationally defined as an increase of ≥ 4 points in the total NIHSS score within 72 h of admission compared to baseline. To ensure consistency and accuracy in assessment, all NIHSS scores were independently evaluated by two attending neurologists (or higher-ranking physicians) who underwent standardized training and were blinded to other study data. Based on the outcome definition, patients were categorized into two groups. END group: Patients with an increase of ≥ 4 points in NIHSS score within 72 h of admission. Non-END group: Patients with an increase of < 4 points in NIHSS score within 72 h of admission.
Sample size estimationSample size was determined using the events per variable (EPV) principle. Based on literature and clinical experience, the estimated incidence of END was 24%, and five candidate predictors were initially planned. With an EPV ≥ 5 considered acceptable for preliminary exploration, the minimum required events were 25, yielding a required sample size of 25/0.24 ≈ 104 patients. After accounting for a potential 20% data loss, the minimum enrollment target was set at 104/0.8 = 130 patients.
This target represented the minimum requirement for theoretical feasibility. However, given the exploratory nature of the study and the need to ensure adequate power for subsequent analyses, we enrolled all eligible patients consecutively admitted between June 2022 and June 2025 (n = 338). Patients were randomly divided into training (n = 236, 70%) and validation (n = 102, 30%) sets. In the training set, 57 END events occurred. The final model included five predictors, corresponding to an EPV of 11.4 (57/5), which substantially exceeds the EPV ≥ 5 criterion and confirms the model’s statistical robustness.
Statistical analysisStatistical analyses were performed using SPSS 26.0 and R 4.2.3. Normally distributed continuous variables were expressed as mean ± standard deviation (SD) and compared with Student’s t-test, while non-normally distributed data were presented as median (interquartile range) and analyzed using the Mann-Whitney U test. Categorical variables were reported as counts (percentages) and compared via χ2-test. In the training set, univariate analysis was first conducted to screen variables with P < 0.05. After variable compression by least absolute shrinkage and selection operator (LASSO) regression to filter out the most predictive core variables and avoid overfitting caused by excessive variables, multivariate logistic regression was further employed to quantify the independent predictive effect of each core variable on END and identify the independent risk factors and protective factors, and their odds ratios (OR) and 95% confidence intervals (CI) were calculated. Random Forest (RF), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN) models were constructed using Python 3.8.5 with the scikit-learn library. To mitigate potential bias due to class imbalance, the RF model was trained with the class_weight = “balanced” parameter to adjust class weights accordingly. To ensure robust evaluation of model stability, we additionally performed 5-fold cross-validation on the training set. Receiver operating characteristic (ROC) curves were plotted with GraphPad Prism 9.0. Model performance was further evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) at the optimal probability threshold determined by the Youden index. Calibration was assessed by plotting observed versus predicted probabilities (calibration curve) and quantified using the Brier score. SHapley Additive exPlanations (SHAP) analysis was applied to quantify the feature importance of core predictors and interpret the machine learning models, and based on the optimal model, the nomogram was developed. A P value < 0.05 was considered statistically significant.
ResultsComparison of baseline characteristics between training and validation setsA total of 338 participants were enrolled and divided into a training set (n = 236, 70%) and a validation set (n = 102, 30%). No significant differences were observed in baseline characteristics between the two sets (P > 0.05) (Table 1).
VariablesTraining set (n = 236)Validation set (n = 102)t/χ2PAge (years)67.81 ± 10.9466.83 ± 11.890.7360.462Gender, n (%)Male142 (60.17)56 (54.90)0.8140.367Female94 (39.83)46 (45.10)History of hypertension, n (%)Yes172 (72.88)71 (69.61)0.3780.539No64 (27.12)31 (30.39)History of diabetes, n (%)Yes75 (31.78)30 (29.41)0.1870.666No161 (68.22)72 (70.59)History of atrial fibrillation, n (%)Yes60 (25.42)28 (27.45)0.1520.697No176 (74.58)74 (72.55)History of stroke/TIA, n (%)Yes54 (22.88)21 (20.59)0.2170.641No182 (77.12)81 (79.41)Admission NIHSS score10.23 ± 5.799.71 ± 5.940.7520.453Baseline impaired consciousness, n (%)GCS < 1568 (28.81)27 (26.47)0.1940.660GCS ≥ 15168 (71.19)75 (73.53)Admission systolic blood pressure (mmHg)153.11 ± 23.84150.42 ± 26.070.9250.355Admission blood glucose (mmol/L)7.94 ± 2.687.55 ± 2.431.2620.208Infarct core volume (mL)28.51 ± 25.1026.84 ± 24.310.5670.571Hypoperfusion volume (mL)82.50 ± 52.0378.91 ± 49.830.5900.556Large artery occlusion, n (%)Yes125 (52.97)53 (51.96)0.0290.865No111 (47.03)49 (48.04)Collateral circulation, n (%)Good124 (52.54)51 (50.00)0.1840.668Poor112 (47.46)51 (50.00)Neutrophil-to-lymphocyte ratio5.49 ± 3.815.23 ± 3.600.5850.559C-reactive protein (mg/L)8.14 ± 7.527.53 ± 6.790.7040.482Early neurological deterioration, n (%)Deterioration57 (24.15)24 (23.53)0.0150.902No deterioration179 (75.85)78 (76.47)Comparison of baseline characteristics between training and validation sets.
Univariate analysis of risk factors for early neurological deterioration in acute ischemic strokeUnivariate analysis revealed statistically significant differences between the END and non-END groups in the training cohort for the following five parameters: admission NIHSS score, admission blood glucose, infarct core volume, collateral circulation status, and NLR (all P < 0.05) (Table 2).
VariablesEND group (n = 57)Non-END group (n = 179)t/χ2PAge (years)68.93 ± 10.8467.21 ± 11.121.0230.307Sex, n (%)Male33 (57.89)109 (60.89)0.1620.687Female24 (42.11)70 (39.11)History of hypertension, n (%)Yes44 (77.19)128 (71.51)0.7070.401No13 (22.81)51 (28.49)History of diabetes, n (%)Yes22 (38.60)53 (29.61)1.6110.204No35 (61.40)126 (70.39)History of atrial fibrillation, n (%)Yes17 (29.82)43 (24.02)0.7680.381No40 (70.18)136 (75.98)History of stroke/TIA, n (%)Yes15(26.32)39(21.79)0.5020.479No42(73.68)140(78.21)Admission NIHSS score11.82 ± 5.139.90 ± 4.172.8570.005Baseline impaired consciousness, n (%)GCS < 1522 (38.60)46 (25.70)3.5070.061GCS ≥ 1535 (61.40)133 (74.30)Admission systolic blood pressure (mmHg)156.23 ± 25.14151.83 ± 23.491.2110.227Admission blood glucose (mmol/L)8.92 ± 2.797.74 ± 2.313.1880.002Infarct core volume (ml)38.90 ± 26.7126.82 ± 21.333.4940.001Hypoperfusion volume (mL)93.14 ± 55.2778.54 ± 48.151.9220.056Large artery occlusion, n (%)Yes36 (63.16)89 (49.72)3.1340.077No21 (36.84)90 (50.28)Collateral circulation, n (%)Good20 (35.09)104 (58.10)5.8860.015Poor37 (64.91)75 (41.90)Neutrophil-to-lymphocyte ratio6.45 ± 3.914.80 ± 3.213.2000.002C-reactive protein (mg/L)11.83 ± 10.269.82 ± 7.041.6670.097Univariate analysis of factors influencing early neurological deterioration in acute ischemic stroke.
NIHSS, National Institutes of Health Stroke Scale.
LASSO regression for feature selectionTo identify the most predictive features for END while preventing overfitting, LASSO regression was employed for feature selection. Figure 1 illustrated the coefficient paths of candidate predictors as the regularization parameter λ (log-transformed) increased. At lower λ values, most features were retained due to higher model complexity. With increasing λ (i.e., stronger regularization), coefficients were progressively shrunk toward zero. At the optimal threshold, five features-admission NIHSS score, admission blood glucose, infarct core volume, collateral circulation status, and NLR-were retained, while others were excluded due to negligible predictive contributions.

LASSO regression coefficient paths.
Multivariate logistic regression analysis of factors influencing early neurological deteriorationEarly neurological deterioration was set as the dependent variable (0 = non-END group, 1 = END group). Five predictive variables were retained: admission NIHSS score, admission blood glucose, infarct core volume, collateral circulation status, and NLR. Multivariate logistic regression demonstrated that admission NIHSS score, admission blood glucose, infarct core volume, and NLR were independent risk factors for END (P < 0.05), while robust collateral circulation served as an independent protective factor (Table 3).
VariablesβSEWaldPOR95%CIAdmission NIHSS score0.1010.0386.9780.0081.1061.026–1.192Admission blood glucose0.1860.0716.7930.0091.2041.047–1.384Infarct core volume0.0260.00712.4430.0011.0271.012–1.042Collateral circulation status−0.8700.3436.4210.0110.4190.214–0.821NLR0.1710.05011.9340.0011.1871.077–1.308Multivariate logistic regression analysis of factors influencing END.
NIHSS, National Institutes of Health Stroke Scale; NLR, Neutrophil-to-lymphocyte ratio. Assignment of independent variables (categorical variables): Collateral circulation status (Poor = 0, Good = 1).
Predictive performance of models in training and validation setsThe RF, KNN, and GBM models were evaluated. In the training set, the AUC values were 0.779, 0.727, and 0.736, respectively; in the validation set, the corresponding AUC values were 0.775, 0.741, and 0.665. The 5-fold cross-validation results further confirmed the internal stability of the RF model, yielding a mean AUC of 0.772 ± 0.018 in the training set. The RF model, which achieved the highest AUC, was selected as the optimal predictive model (Figure 2). At the optimal probability threshold determined by the Youden index (0.26 in the training set, 0.25 in the validation set), the RF model achieved the following performance metrics in the training set: accuracy = 0.754, sensitivity = 0.702, specificity = 0.771, PPV = 0.494, and NPV = 0.891. In the validation set, the corresponding values were: accuracy = 0.735, sensitivity = 0.667, specificity = 0.756, PPV = 0.457, and NPV = 0.882. Calibration of the RF model was assessed by plotting observed versus predicted probabilities (Figure 3). The calibration curve lay close to the diagonal line, indicating good agreement between predicted and actual risks. The Brier score was 0.152 in the training set and 0.161 in the validation set, further confirming adequate prediction accuracy.

Receiver operating characteristic curves (A: training set; B: validation set). RF, Random Forest; GBM, Gradient Boosting Machine; and KNN, K-Nearest Neighbors.

Calibration analysis of the prediction model in the training set (A) and validation set (B). RF, Random Forest; GBM, Gradient Boosting Machine; and KNN, K-Nearest Neighbors.
Construction of the prediction model for early neurological deteriorationAs the number of decision trees increased, the error stabilized, reflecting the dynamic performance of the model during iterative tree construction. This trend aided in assessing model convergence: when the error curve plateaued, further trees provided marginal improvement, guiding the selection of an optimal tree count to balance complexity and predictive accuracy (Supplementary Figure 1).
The RF model ranked the independent predictors of END by importance scores in descending order: admission blood glucose, infarct core volume, collateral circulation status, NLR, and admission NIHSS score (Figure 4).

Feature importance ranking in the RF model. X1, Admission National Institutes of Health Stroke Scale score; X2, Admission blood glucose; X3, Infarct core volume; X4, Collateral circulation status; X5, Neutrophil-to-lymphocyte ratio.
Interpretability of model predictionsFigure 5 presented a case-specific SHAP (Shapley Additive Explanations) waterfall plot, demonstrating the model’s prediction process. Admission NIHSS score, collateral circulation status, and NLR contributed positively to the prediction, whereas admission blood glucose and infarct core volume exerted negative effects. The f (x) value represents the SHAP value for each feature. To facilitate the clinical translation of our model, we further constructed a simple risk score nomogram based on the five core predictors (Figure 6). This nomogram translated the model’s complex algorithm into an intuitive visual tool, allowing clinicians to quickly estimate the probability of END in individual AIS patients by summing the points assigned to each variable and mapping the total score to the corresponding risk probability.

SHapley Additive exPlanations waterfall plot. X1, Admission National Institutes of Health Stroke Scale score; X2, Admission blood glucose; X3, Infarct core volume; X4, Collateral circulation status; X5, Neutrophil-to-lymphocyte ratio.

Nomogram for predicting early neurological deterioration risk in acute ischemic stroke patients. NIHSS, National Institutes of Health Stroke Scale; NLR, Neutrophil-to-lymphocyte ratio. Assignment of independent variables (categorical variables): Collateral circulation status (Poor = 0, Good = 1).
DiscussionThe primary objective of this study was to develop and validate a machine-learning model based on multidimensional data for predicting the risk of END in patients with AIS. In this study, five core predictive indicators were successfully screened from a patient cohort not included in reperfusion therapy: the NIHSS score at admission, infarct core volume, collateral circulation status, -NLR, and blood glucose at admission. A high-performance RF prediction model was then constructed. The AUC values of this model in the training set and validation set were 0.779 and 0.775 respectively, which demonstrated its good discriminatory ability and generalization performance. The five core indicators screened in this study are not isolated risk labels but joint
Comments (0)