Multidimensional machine learning for early neurological deterioration prediction in acute ischemic stroke

Abstract

Objective:

This study aimed to develop and validate a multidimensional clinical feature-based machine learning model for accurately predicting the risk of early neurological deterioration (END) in patients with acute ischemic stroke (AIS).

Methods:

A total of 338 AIS patients were randomly divided into a training set (n = 236) and a validation set (n = 102). Five core predictors were identified from multiple clinical and pathological indicators: admission National Institutes of Health Stroke Scale (NIHSS) score, admission blood glucose, infarct core volume, collateral circulation status, and neutrophil-to-lymphocyte ratio (NLR). In the training set, univariate analysis was first performed to screen prognosis-related factors. After variable compression via least absolute shrinkage and selection operator (LASSO) regression, multivariate logistic regression was employed to determine independent risk factors for poor prognosis. Using Python, three prediction models-Random Forest (RF), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN)-were constructed. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), and the optimal model was selected.

Results:

No statistically significant differences were observed in baseline characteristics between the training and validation sets (P > 0.05). Multivariate logistic regression revealed that admission NIHSS score, blood glucose, infarct core volume, and NLR were independent risk factors (P < 0.05), while collateral circulation status was an independent protective factor (P < 0.05). The RF model demonstrated superior predictive performance, with AUC values of 0.779 (training set) and 0.775 (validation set), significantly outperforming KNN (0.727, 0.741) and GBM (0.736, 0.665).

Conclusion:

The multidimensional model provides a potential practical tool for early clinical identification of high-risk END patients and timely intervention.

Introduction

Acute ischemic stroke (AIS) is one of the leading causes of disability and mortality worldwide, imposing a substantial burden on society and families (1). Despite landmark advances in hyperacute revascularization therapies, the post-stroke disease course remains highly variable, with a significant proportion of patients at risk of clinical progression (2). Among these challenges, early neurological deterioration (END) represents a common yet formidable clinical complication following AIS (3). END typically refers to unexpected worsening or fluctuation in neurological deficits within the initial hours to days after stroke onset (4). Although its precise definition varies slightly across studies, a commonly adopted criterion is an increase of ≥ 2 or ≥ 4 points on the National Institutes of Health Stroke Scale (NIHSS). Extensive literature indicates that END occurs in 10–40% of patients and is strongly associated with poor long-term functional outcomes, prolonged hospitalization, and elevated mortality (5).

The pathophysiological mechanisms underlying END are complex and multifactorial, involving several key aspects: First, ischemic progression serves as the central mechanism, wherein thrombus extension or new emboli lead to the conversion of the ischemic penumbra into the infarct core, directly driving neurological decline (6). Second, secondary brain injury processes, such as excessive activation of post-ischemic inflammation, excitatory amino acid toxicity, and massive free radical production, collectively exacerbate blood-brain barrier disruption and neuronal death (7). Additionally, systemic factors, including fever, infection, dysglycemia, and hemodynamic instability, may contribute to END by reducing cerebral perfusion or increasing metabolic demand (8).

Currently, clinical practice lacks a unified and efficient tool for END prediction. Traditional approaches predominantly rely on physicians’ empirical judgment or isolated predictors, such as higher baseline NIHSS scores, admission hyperglycemia, or large vessel occlusion on imaging (9, 10). However, these individual indicators exhibit limited predictive performance, often failing to balance specificity and sensitivity, and inadequately capture the multifactorial nature of END. Therefore, there is an urgent need for a predictive model capable of integrating multidimensional data to quantify individual risk, enabling early identification of high-risk patients.

Advances in medical informatics have provided novel solutions to such complex challenges through machine learning (ML). ML algorithms can autonomously learn intricate nonlinear relationships and interactions from high-dimensional data, uncovering patterns beyond conventional statistical models (11). In stroke research, ML has been successfully applied to diagnostic classification, outcome prediction, and imaging analysis (12). A multidimensional ML model incorporating clinical assessments, neuroimaging, and serum biomarkers may theoretically provide a more comprehensive representation of END pathophysiology, thereby enabling more precise risk stratification.

Against this background, this study aimed to retrospectively collect clinical data from AIS patients, identify core predictors closely associated with END, and develop a robust predictive model using advanced ML algorithms. It is anticipated that this model will serve as an early and convenient decision-support tool for clinicians, facilitating timely intervention for high-risk patients and ultimately improving outcomes.

Materials and methodsStudy population

This single-center, retrospective observational study consecutively screened AIS patients admitted to the Department of Neurology between June 2022 and June 2025. Inclusion criteria were: (1) age ≥ 18 years; (2) time from onset to admission < 24 h (13); (3) completion of baseline magnetic resonance imaging (including DWI and PWI sequences) and relevant laboratory tests upon admission. Exclusion criteria included: (1) receipt of intravenous thrombolysis or endovascular thrombectomy (to avoid reperfusion-related confounding); (2) death or discharge within 24 h of admission; (3) severe organ dysfunction or active infection; (4) incomplete clinical or imaging data. Ultimately, 338 patients were included in the analysis.

Data collection

Data were collected using a predefined standardized case report form. All variables were independently extracted by two researchers from electronic medical records and cross-checked for accuracy. Demographic data (age, gender) and medical history (hypertension, diabetes, atrial fibrillation, prior stroke, or transient ischemic attack [TIA]) were recorded. Admission clinical features included NIHSS score (baseline), Glasgow Coma Scale (GCS) score, first-measured systolic blood pressure, and point-of-care glucose. Imaging data were analyzed by two blinded neuroradiologists unaware of clinical outcomes. MRI was performed using a 3.0T scanner (Siemens Skyra) with the following sequences: diffusion-weighted imaging (DWI, b-values 0 and 1,000 s/mm2, slice thickness 5 mm), perfusion-weighted imaging (PWI, dynamic susceptibility contrast, 20 time-points), and time-of-flight MR angiography. Infarct core volume was measured as the volume of the restricted diffusion area on DWI, calculated using semi-automated segmentation software (Olea Sphere, v3.0). Hypoperfusion volume was defined as the area with prolonged mean transit time (>145% of contralateral hemisphere) on PWI. Collateral circulation status was assessed on MR angiography using the modified Tan scale, with good collateral defined as collateral supply filling > 50% of the distal middle cerebral artery territory. Laboratory data consisted of the first venous blood test results post-admission. The neutrophil-to-lymphocyte ratio (NLR) was calculated by dividing absolute neutrophil count by absolute lymphocyte count from complete blood cell analysis. Inflammatory markers, such as C-reactive protein (CRP), were also documented.

Outcome definition

The primary outcome of this study was END, operationally defined as an increase of ≥ 4 points in the total NIHSS score within 72 h of admission compared to baseline. To ensure consistency and accuracy in assessment, all NIHSS scores were independently evaluated by two attending neurologists (or higher-ranking physicians) who underwent standardized training and were blinded to other study data. Based on the outcome definition, patients were categorized into two groups. END group: Patients with an increase of ≥ 4 points in NIHSS score within 72 h of admission. Non-END group: Patients with an increase of < 4 points in NIHSS score within 72 h of admission.

Sample size estimation

Sample size was determined using the events per variable (EPV) principle. Based on literature and clinical experience, the estimated incidence of END was 24%, and five candidate predictors were initially planned. With an EPV ≥ 5 considered acceptable for preliminary exploration, the minimum required events were 25, yielding a required sample size of 25/0.24 ≈ 104 patients. After accounting for a potential 20% data loss, the minimum enrollment target was set at 104/0.8 = 130 patients.

This target represented the minimum requirement for theoretical feasibility. However, given the exploratory nature of the study and the need to ensure adequate power for subsequent analyses, we enrolled all eligible patients consecutively admitted between June 2022 and June 2025 (n = 338). Patients were randomly divided into training (n = 236, 70%) and validation (n = 102, 30%) sets. In the training set, 57 END events occurred. The final model included five predictors, corresponding to an EPV of 11.4 (57/5), which substantially exceeds the EPV ≥ 5 criterion and confirms the model’s statistical robustness.

Statistical analysis

Statistical analyses were performed using SPSS 26.0 and R 4.2.3. Normally distributed continuous variables were expressed as mean ± standard deviation (SD) and compared with Student’s t-test, while non-normally distributed data were presented as median (interquartile range) and analyzed using the Mann-Whitney U test. Categorical variables were reported as counts (percentages) and compared via χ2-test. In the training set, univariate analysis was first conducted to screen variables with P < 0.05. After variable compression by least absolute shrinkage and selection operator (LASSO) regression to filter out the most predictive core variables and avoid overfitting caused by excessive variables, multivariate logistic regression was further employed to quantify the independent predictive effect of each core variable on END and identify the independent risk factors and protective factors, and their odds ratios (OR) and 95% confidence intervals (CI) were calculated. Random Forest (RF), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN) models were constructed using Python 3.8.5 with the scikit-learn library. To mitigate potential bias due to class imbalance, the RF model was trained with the class_weight = “balanced” parameter to adjust class weights accordingly. To ensure robust evaluation of model stability, we additionally performed 5-fold cross-validation on the training set. Receiver operating characteristic (ROC) curves were plotted with GraphPad Prism 9.0. Model performance was further evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) at the optimal probability threshold determined by the Youden index. Calibration was assessed by plotting observed versus predicted probabilities (calibration curve) and quantified using the Brier score. SHapley Additive exPlanations (SHAP) analysis was applied to quantify the feature importance of core predictors and interpret the machine learning models, and based on the optimal model, the nomogram was developed. A P value < 0.05 was considered statistically significant.

ResultsComparison of baseline characteristics between training and validation sets

A total of 338 participants were enrolled and divided into a training set (n = 236, 70%) and a validation set (n = 102, 30%). No significant differences were observed in baseline characteristics between the two sets (P > 0.05) (Table 1).

VariablesTraining set (n = 236)Validation set (n = 102)t/χ2PAge (years)67.81 ± 10.9466.83 ± 11.890.7360.462Gender, n (%)Male142 (60.17)56 (54.90)0.8140.367Female94 (39.83)46 (45.10)History of hypertension, n (%)Yes172 (72.88)71 (69.61)0.3780.539No64 (27.12)31 (30.39)History of diabetes, n (%)Yes75 (31.78)30 (29.41)0.1870.666No161 (68.22)72 (70.59)History of atrial fibrillation, n (%)Yes60 (25.42)28 (27.45)0.1520.697No176 (74.58)74 (72.55)History of stroke/TIA, n (%)Yes54 (22.88)21 (20.59)0.2170.641No182 (77.12)81 (79.41)Admission NIHSS score10.23 ± 5.799.71 ± 5.940.7520.453Baseline impaired consciousness, n (%)GCS < 1568 (28.81)27 (26.47)0.1940.660GCS ≥ 15168 (71.19)75 (73.53)Admission systolic blood pressure (mmHg)153.11 ± 23.84150.42 ± 26.070.9250.355Admission blood glucose (mmol/L)7.94 ± 2.687.55 ± 2.431.2620.208Infarct core volume (mL)28.51 ± 25.1026.84 ± 24.310.5670.571Hypoperfusion volume (mL)82.50 ± 52.0378.91 ± 49.830.5900.556Large artery occlusion, n (%)Yes125 (52.97)53 (51.96)0.0290.865No111 (47.03)49 (48.04)Collateral circulation, n (%)Good124 (52.54)51 (50.00)0.1840.668Poor112 (47.46)51 (50.00)Neutrophil-to-lymphocyte ratio5.49 ± 3.815.23 ± 3.600.5850.559C-reactive protein (mg/L)8.14 ± 7.527.53 ± 6.790.7040.482Early neurological deterioration, n (%)Deterioration57 (24.15)24 (23.53)0.0150.902No deterioration179 (75.85)78 (76.47)

Comparison of baseline characteristics between training and validation sets.

Univariate analysis of risk factors for early neurological deterioration in acute ischemic stroke

Univariate analysis revealed statistically significant differences between the END and non-END groups in the training cohort for the following five parameters: admission NIHSS score, admission blood glucose, infarct core volume, collateral circulation status, and NLR (all P < 0.05) (Table 2).

VariablesEND group (n = 57)Non-END group (n = 179)t/χ2PAge (years)68.93 ± 10.8467.21 ± 11.121.0230.307Sex, n (%)Male33 (57.89)109 (60.89)0.1620.687Female24 (42.11)70 (39.11)History of hypertension, n (%)Yes44 (77.19)128 (71.51)0.7070.401No13 (22.81)51 (28.49)History of diabetes, n (%)Yes22 (38.60)53 (29.61)1.6110.204No35 (61.40)126 (70.39)History of atrial fibrillation, n (%)Yes17 (29.82)43 (24.02)0.7680.381No40 (70.18)136 (75.98)History of stroke/TIA, n (%)Yes15(26.32)39(21.79)0.5020.479No42(73.68)140(78.21)Admission NIHSS score11.82 ± 5.139.90 ± 4.172.8570.005Baseline impaired consciousness, n (%)GCS < 1522 (38.60)46 (25.70)3.5070.061GCS ≥ 1535 (61.40)133 (74.30)Admission systolic blood pressure (mmHg)156.23 ± 25.14151.83 ± 23.491.2110.227Admission blood glucose (mmol/L)8.92 ± 2.797.74 ± 2.313.1880.002Infarct core volume (ml)38.90 ± 26.7126.82 ± 21.333.4940.001Hypoperfusion volume (mL)93.14 ± 55.2778.54 ± 48.151.9220.056Large artery occlusion, n (%)Yes36 (63.16)89 (49.72)3.1340.077No21 (36.84)90 (50.28)Collateral circulation, n (%)Good20 (35.09)104 (58.10)5.8860.015Poor37 (64.91)75 (41.90)Neutrophil-to-lymphocyte ratio6.45 ± 3.914.80 ± 3.213.2000.002C-reactive protein (mg/L)11.83 ± 10.269.82 ± 7.041.6670.097

Univariate analysis of factors influencing early neurological deterioration in acute ischemic stroke.

NIHSS, National Institutes of Health Stroke Scale.

LASSO regression for feature selection

To identify the most predictive features for END while preventing overfitting, LASSO regression was employed for feature selection. Figure 1 illustrated the coefficient paths of candidate predictors as the regularization parameter λ (log-transformed) increased. At lower λ values, most features were retained due to higher model complexity. With increasing λ (i.e., stronger regularization), coefficients were progressively shrunk toward zero. At the optimal threshold, five features-admission NIHSS score, admission blood glucose, infarct core volume, collateral circulation status, and NLR-were retained, while others were excluded due to negligible predictive contributions.

Line graph displaying binomial deviance on the y-axis and the logarithm of lambda on the x-axis, with colored lines for different numbers of variables. Error bars are shown for each point, and two dashed vertical lines indicate important lambda values.

LASSO regression coefficient paths.

Multivariate logistic regression analysis of factors influencing early neurological deterioration

Early neurological deterioration was set as the dependent variable (0 = non-END group, 1 = END group). Five predictive variables were retained: admission NIHSS score, admission blood glucose, infarct core volume, collateral circulation status, and NLR. Multivariate logistic regression demonstrated that admission NIHSS score, admission blood glucose, infarct core volume, and NLR were independent risk factors for END (P < 0.05), while robust collateral circulation served as an independent protective factor (Table 3).

VariablesβSEWaldPOR95%CIAdmission NIHSS score0.1010.0386.9780.0081.1061.026–1.192Admission blood glucose0.1860.0716.7930.0091.2041.047–1.384Infarct core volume0.0260.00712.4430.0011.0271.012–1.042Collateral circulation status−0.8700.3436.4210.0110.4190.214–0.821NLR0.1710.05011.9340.0011.1871.077–1.308

Multivariate logistic regression analysis of factors influencing END.

NIHSS, National Institutes of Health Stroke Scale; NLR, Neutrophil-to-lymphocyte ratio. Assignment of independent variables (categorical variables): Collateral circulation status (Poor = 0, Good = 1).

Predictive performance of models in training and validation sets

The RF, KNN, and GBM models were evaluated. In the training set, the AUC values were 0.779, 0.727, and 0.736, respectively; in the validation set, the corresponding AUC values were 0.775, 0.741, and 0.665. The 5-fold cross-validation results further confirmed the internal stability of the RF model, yielding a mean AUC of 0.772 ± 0.018 in the training set. The RF model, which achieved the highest AUC, was selected as the optimal predictive model (Figure 2). At the optimal probability threshold determined by the Youden index (0.26 in the training set, 0.25 in the validation set), the RF model achieved the following performance metrics in the training set: accuracy = 0.754, sensitivity = 0.702, specificity = 0.771, PPV = 0.494, and NPV = 0.891. In the validation set, the corresponding values were: accuracy = 0.735, sensitivity = 0.667, specificity = 0.756, PPV = 0.457, and NPV = 0.882. Calibration of the RF model was assessed by plotting observed versus predicted probabilities (Figure 3). The calibration curve lay close to the diagonal line, indicating good agreement between predicted and actual risks. The Brier score was 0.152 in the training set and 0.161 in the validation set, further confirming adequate prediction accuracy.

Panel A shows a receiver operating characteristic (ROC) curve comparing three models: random forest (AUC 0.779), k-nearest neighbors (AUC 0.727), and gradient boosting (AUC 0.736). Panel B presents ROC curves for the same models with random forest (AUC 0.775), k-nearest neighbors (AUC 0.741), and gradient boosting (AUC 0.665), each plotted against sensitivity and 1-specificity. A diagonal dashed line indicates chance-level performance. Legends specify model and area under the curve with confidence intervals.

Receiver operating characteristic curves (A: training set; B: validation set). RF, Random Forest; GBM, Gradient Boosting Machine; and KNN, K-Nearest Neighbors.

Side-by-side calibration plots labeled A and B display observed probability versus predicted probability for three models: RF in blue, KNN in red, and GB in yellow, with a diagonal grey line for ideal calibration. Both plots show model calibration performance, highlighting deviations from the ideal line across predicted probabilities, with each model following a distinct curve.

Calibration analysis of the prediction model in the training set (A) and validation set (B). RF, Random Forest; GBM, Gradient Boosting Machine; and KNN, K-Nearest Neighbors.

Construction of the prediction model for early neurological deterioration

As the number of decision trees increased, the error stabilized, reflecting the dynamic performance of the model during iterative tree construction. This trend aided in assessing model convergence: when the error curve plateaued, further trees provided marginal improvement, guiding the selection of an optimal tree count to balance complexity and predictive accuracy (Supplementary Figure 1).

The RF model ranked the independent predictors of END by importance scores in descending order: admission blood glucose, infarct core volume, collateral circulation status, NLR, and admission NIHSS score (Figure 4).

Variable importance plot with two side-by-side scatterplots. The left plot displays Mean Decrease Accuracy for variables X1 through X5, with X2 showing the highest value. The right plot presents Mean Decrease Gini for the same variables, again with X2 having the highest importance. Both plots rank variable contributions to model accuracy and purity.

Feature importance ranking in the RF model. X1, Admission National Institutes of Health Stroke Scale score; X2, Admission blood glucose; X3, Infarct core volume; X4, Collateral circulation status; X5, Neutrophil-to-lymphocyte ratio.

Interpretability of model predictions

Figure 5 presented a case-specific SHAP (Shapley Additive Explanations) waterfall plot, demonstrating the model’s prediction process. Admission NIHSS score, collateral circulation status, and NLR contributed positively to the prediction, whereas admission blood glucose and infarct core volume exerted negative effects. The f (x) value represents the SHAP value for each feature. To facilitate the clinical translation of our model, we further constructed a simple risk score nomogram based on the five core predictors (Figure 6). This nomogram translated the model’s complex algorithm into an intuitive visual tool, allowing clinicians to quickly estimate the probability of END in individual AIS patients by summing the points assigned to each variable and mapping the total score to the corresponding risk probability.

Waterfall chart illustrating prediction contributions by feature, where X3 lowers the prediction by one point three two, X2 lowers it by one point one five, two features increase it by zero point four four one, and X5 lowers it by zero point three two three, progressing from an initial value of one point nine six to an expected value of zero point three nine nine.

SHapley Additive exPlanations waterfall plot. X1, Admission National Institutes of Health Stroke Scale score; X2, Admission blood glucose; X3, Infarct core volume; X4, Collateral circulation status; X5, Neutrophil-to-lymphocyte ratio.

Nomogram graphic for stroke risk prediction showing scales for points, admission NIHSS score, admission blood glucose, infarct core volume, collateral circulation status, NLR, total points, linear predictor, and risk.

Nomogram for predicting early neurological deterioration risk in acute ischemic stroke patients. NIHSS, National Institutes of Health Stroke Scale; NLR, Neutrophil-to-lymphocyte ratio. Assignment of independent variables (categorical variables): Collateral circulation status (Poor = 0, Good = 1).

Discussion

The primary objective of this study was to develop and validate a machine-learning model based on multidimensional data for predicting the risk of END in patients with AIS. In this study, five core predictive indicators were successfully screened from a patient cohort not included in reperfusion therapy: the NIHSS score at admission, infarct core volume, collateral circulation status, -NLR, and blood glucose at admission. A high-performance RF prediction model was then constructed. The AUC values of this model in the training set and validation set were 0.779 and 0.775 respectively, which demonstrated its good discriminatory ability and generalization performance. The five core indicators screened in this study are not isolated risk labels but joint

Comments (0)

No login
gif