Background Cardiac surgery is one of the most complex and high-stakes areas of medicine, where intraoperative decisions must be made within seconds and incomplete information can compromise outcomes. Traditional risk scores and rule-based decision support tools provide limited real-time guidance and rarely integrate the unstructured data streams available during surgery. Recent advances in large language models (LLMs) such as OpenAI’s GPT-5 and Anthropic’s Claude 3.5 family have demonstrated state-of-the-art reasoning, summarization, and clinical dialogue capabilities. However, their safety and trustworthiness in surgical settings remain untested.
Objective To evaluate the feasibility and clinician trustworthiness of CardiacGPT, a real-time AI assistant that leverages the newest-generation LLMs for intraoperative guidance and postoperative decision support.
Methods We retrospectively analyzed 500 de-identified cardiac surgery cases from Brigham and Women’s Hospital, including CABG, valve, and combined procedures. Structured EHR variables, intraoperative monitoring, and operative notes were formatted into standardized prompts and processed through four cutting-edge models: OpenAI GPT-5, Anthropic Claude 3.5 Opus, Claude 3.5 Sonnet, and Claude 3.5 Haiku. Outputs were presented via a blinded Bidding App to attending cardiac surgeons and ICU clinicians, who scored trust and clinical relevance on a 5-point Likert scale. The primary outcome was the proportion of high-trust ratings (score > 4); secondary outcomes included mean trust scores, variance, and inter-rater reliability.
Results Across 2,000 evaluations, GPT-5 and Claude 3.5 Opus achieved the highest mean trust scores (4.83 and 4.79, respectively), each exceeding 98% high-trust ratings. Claude 3.5 Sonnet performed moderately (mean 3.9, 74% high-trust), while Claude 3.5 Haiku produced less context-specific recommendations (mean 3.6, 66% high-trust). Inter-rater reliability was excellent, with ICC(2,1) = 0.91 (95% CI 0.88–0.94), confirming strong agreement among reviewers. Qualitative analysis showed that GPT-5 and Claude 3.5 Opus generated actionable and context-aware outputs, whereas smaller models often produced generic or incomplete guidance.
Conclusions CardiacGPT, powered by the newest LLMs (GPT-5 and Claude 3.5 series), demonstrated feasibility and exceptionally high clinician trust across 500 real-world surgical cases. This is the first blinded, multi-model evaluation of next-generation LLMs for cardiac surgery. While outcome-based prospective trials are still required, these results establish CardiacGPT as a promising real-time co-pilot for cardiac surgeons, with the potential to reduce cognitive load, standardize intraoperative communication, and improve postoperative planning.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis study did not receive any funding
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study protocol was reviewed by the Brigham and Women's Hospital Institutional Review Board and deemed exempt for secondary analysis of deidentified data (IRB#2025A013693).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityAll data produced in the present study are available upon reasonable request to the authors
Data AvailabilityDeidentified clinical data are not publicly available due to institutional restrictions. Researchers may request controlled access from the corresponding author, subject to BWH/MGB data use agreements and IRB approval.
Comments (0)