Predicting the need for medical care after toxin exposure using SHAP-interpretable gradient boosting

Abstract

Objective Experts in poison control centers must accurately and efficiently assess the severity of an exposure, neither delaying care nor pointlessly sending patients to the hospital, using only the information given during a first phone call. To help healthcare professionals (HP) make these difficult decisions, we developed and evaluated a machine learning-based algorithm that predicts whether a patient should seek medical help or not, based solely on the information provided during their first call to the poison control center, for all kinds of mono-intoxications.

Methods We extracted data recorded by clinicians at the Lyon PCC between 2000 and 2025. Cases with missing original recommendations were excluded. We trained and compared several machine-learning models, emphasizing decision-tree–based and gradient-boosted tree approaches. Two classification tasks were defined: (1) binary triage (recommend emergency or non-emergency healthcare facility vs. stay at home) and (2) three-class triage (stay at home / non-emergency healthcare facility / emergency healthcare facility). Missing data were left as-is. Cross-validation and bootstrapping were used to ensure stable and statistically significant results. Model explainability was assessed with SHAP to identify the most important features for predictions. Model performance was evaluated using F1-score and ROC AUC; class imbalance was addressed during training. We compared our results to published algorithms that focus on single-substance intoxications.

Results After processing, 220,825 cases remained. Recommended dispositions were: stay at home 66.6%, emergency facility 25.4%, and non-emergency facility 7.4%. For the binary task, XGBoost achieved the best performance (F1 = 0.748; ROC AUC = 0.820). For the three-class task, XGBoost again performed best (macro F1 = 0.657; multiclass ROC AUC = 0.859). The delay from exposure to call, SNOMED symptom codes, and the circumstance of exposure were the most influential features. Our results were competitive with algorithms focusing on intoxication due to a single substance.

Conclusion Gradient-boosted tree models can produce accurate, interpretable, and clinically relevant predictions of poisoning severity from routine PCC data. With external validation and prospective testing, such tools could complement expert judgment to improve triage consistency and patient outcomes.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was funded by University Claude Bernard Lyon1 (SENS 2024)

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Individual patient consent was not required for this study, according to French law regarding retrospective research conforming to the norm MR-003 (JORF no. 0160 du 13 juillet 2018. texte no. 109). Our database (de-identified individual data) has been registered at the Commission Nationale de l' Informatique et des Libertes (registration no. 747735), in compliance with French law on electronic data sources.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data are not available at the time.

Comments (0)

No login
gif