Objectives To evaluate the performance of SROTAS IQ, a custom fine-tuned large language model (LLM), in automating clinical trial eligibility screening for breast cancer patients using synthetic data.
Methods Ten breast cancer trials were selected across diverse treatment settings and molecular subtypes. Fifteen synthetic patient summaries per trial were generated, including realistic and enriched eligibility scenarios. Two independent oncologists assessed trial eligibility for each patient, establishing ground truth. SROTAS IQ LLM was evaluated against expert consensus using standard classification metrics. Time-to-verdict was measured to compare clinician effort with automated assessment.
Results SROTAS IQ demonstrated strong concordance with expert assessments, achieving 90% or greater accuracy in 5 of 10 trials. Across 150 patient-trial evaluations, the model correctly classified 88% of overall eligibility decisions. Performance was highest in trials with moderate complexity and fewer nested criteria, while more intricate protocols showed reduced accuracy. The LLM consistently delivered rapid assessments (<0.5 minutes per patient), with explainable outputs that aligned with clinical reasoning. These findings underscore the model’s potential to support high-fidelity, scalable trial matching in oncology.
Conclusion SROTAS IQ offers a promising approach to automating clinical trial matching in oncology. Further real-world validation is needed to confirm generalisability and integration into clinical practice.
Competing Interest StatementCompeting Interests: All authors are employees of Srotas Health Ltd and were compensated for their work on this study. Srotas Health Ltd developed the SROTAS IQ system evaluated in this manuscript.
Funding StatementThis study was funded by Srotas Health Ltd and supported in part by a grant from Innovate UK. All authors are employees of Srotas Health Ltd, which provided resources for study design, data generation, model development, analysis, and manuscript preparation. No other external funding or services were received.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This work used only simulated data and utlised only publicly available sources, specifically clinical trial criteria sourced from Clinical Trials.gov. Synthetic patients were generated with an LLM (Claude Sonnet 4) using prompts provided in Supplementary File 1. No real patient records were used.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityAll data produced in the present study are available upon reasonable request to the authors
Comments (0)