Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care

Abstract

RATIONALE Recent studies have found that stigmatizing terms can incline physicians to pursue punitive approaches to patient care. The intensive care unit (ICU) contains large volumes of progress notes that may contain stigmatizing language, which could perpetuate negative biases against patients and affect healthcare delivery. Patients with substance use disorders (alcohol, opioid, and non-opioid drugs) are particularly vulnerable to stigma. This study aimed to examine the performance of Large Language Models (LLMs) in the identification of stigmatizing language from ICU progress notes of patients with substance use disorders (SUD).

METHODS Clinical notes were sampled from the Medical Information Mart for Intensive Care (MIMIC)-III, which contains 2,083,180 ICU notes. These 2,083,180 notes were passed into a rule-based labeling approach followed by manual verification for more ambiguous cases. The labeling approach followed the NIH guidelines on stigma in SUD. The labeling process resulted in identifying 38,552 stigmatizing encounters. To design our cohort, we randomly sampled an equivalent amount of non-stigmatizing encounters to create a dataset with 77,104 notes. This cohort was organized into train/development/test datasets (70/15/15). We utilized Meta’s Llama-3 8B Instruct LLM to run the following experiments for stigma detection: (1) prompts with instructions that adhere to the NIH terms (Zero-Shot); (2) prompts with instructions and examples (in-context learning); (3) in-context learning with a selective retrieval system for the NIH terms (Retrieval Augmented Generation-RAG); and (4) supervised fine-tuning (SFT). We also created a baseline model using keyword search. Evaluation was performed on the held-out test set for accuracy, macro F1 score, and error analysis. The LLM-based approaches were prompted to provide their reasoning for label prediction. Additionally, all approaches were evaluated on an external validation dataset from the University of Wisconsin (UW) Health System with 288,130 ICU notes.

RESULTS SFT had the best performance with 97.2% accuracy, followed by in-context learning. The LLMs with in-context learning and SFT provided appropriate reasoning for false positives during human review. Both approaches identified clinical notes with stigmatizing language that were missed during annotation (10/93 false positives for SFT and 22/186 false positives for the in-context learning approach were considered valid after human review). SFT maintained its accuracy at 97.9% on a similarly balanced external validation dataset.

CONCLUSION Our findings demonstrate that LLMs, particularly using SFT and in-context learning, effectively identify stigmatizing language in ICU notes with high accuracy while explaining their reasoning in an asynchronous fashion without needing rigorous and time-intensive manual verification involved in labeling. These models also demonstrated the ability to identify novel stigmatizing language not explicitly in training data nor existing guidelines. This study highlights the potential of LLMs in reducing stigma in clinical documentation, especially for patients with SUD. These LLMs enable identification of stigmatizing language in clinical notes that can perpetuate negative stigma towards patients and encourage rewriting of notes.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was funded by National Institutes of Health (NIH) R01LM012973, R01DA051464. The funder played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study was reviewed by the University of Wisconsin-Madison Institutional Review Board (IRB; 2023-1252) and determined to be exempt from human subjects research. The IRB approved the study with a waiver of informed consent.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The data that support the findings of this study are available from MIMIC [5] but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of MIMIC [5]. Data will be distributed through PhysioNet after the paper is accepted.

View original article

Medrxiv Addiction Medicine Medrxiv

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care

Comments (0)