Evaluating the Influence of Demographic Identity in the Medical Use of Large Language Models

Abstract

As large language models (LLMs) are increasingly adopted in medical decision-making, concerns about demographic biases in AIgenerated recommendations remain unaddressed. In this study, we systematically investigate how demographic attributes—specifically race and gender—affect the diagnostic, medication, and treatment decisions of LLMs. Using the MedQA dataset, we construct a controlled evaluation framework comprising 20,000 test cases with systematically varied doctor-patient demographic pairings. We evaluate two LLMs of different scales: Claude 3.5 Sonnet, a highperformance proprietary model, and Llama 3.1-8B, a smaller open-source alternative. Our analysis reveals significant disparities in both accuracy and bias patterns across models and tasks. While Claude 3.5 Sonnet demonstrates higher overall accuracy and more stable predictions, Llama 3.1-8B exhibits greater sensitivity to demographic attributes, particularly in diagnostic reasoning. Notably, we observe the largest accuracy drop when Hispanic patients are treated by White male doctors, underscoring potential risks of bias amplification. These findings highlight the need for rigorous fairness assessments in medical AI and inform strategies to mitigate demographic biases in LLM-driven healthcare applications.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are available upon reasonable request to the authors

View original article

Medrxiv - Medical Ethics

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Evaluating the Influence of Demographic Identity in the Medical Use of Large Language Models

Comments (0)