Objective Systematic reviews (SRs) are essential for evidence-based practice but remain labor-intensive, especially during title and abstract screening. This study evaluates whether multiple large language model (multi-LLM) collaboration can improve screening prioritization while reducing costs.
Methods Abstract screening was framed as a question-answering (QA) task using cost-effective LLMs. Three multi-LLM collaboration strategies were proposed and evaluated, including majority voting by averaging opinions of peers, multi-agent debate (MAD) for answer refinement, and LLM-based adjudication against answers of individual QA baselines. These strategies were evaluated on the CLEF eHealth 2019 Technology-Assisted Review benchmark using standard performance metrics in the domain, including Mean Average Precision (MAP), Recall@k%, and Work Saved over Sampling (WSS).
Results Multi-LLM collaboration strategies significantly outperformed QA baselines. Majority voting was the most outstanding, achieving the highest MAP 0.4621 and 0.3409 on the subsets of SRs about clinical intervention and diagnostic technology assessment, respectively, with WSS@95% 0.6064 and 0.6798, enabling in theory up to 68% workload reduction at 95% recall of all included studies. MAD improved weaker models the most. The adjudicator-as-a-ranker method surpassed adjudicator-as-a-judge and was the second strongest approach, but with a significantly higher cost than majority voting and debating.
Conclusion Multi-LLM collaboration can substantially improve abstract screening efficiency, and the success lies in model diversity. Making the the best use of diversity, majority voting stands out in terms of both fantastic performance and low cost compared to adjudication.Despite context-dependent gains and diminishing model diversity, MAD is still a cost-effective strategy and a potential direction of further research.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementO.A. is fully funded by a PhD scholarship at Coventry University. X.J. is supported by the International Exchange Scheme of the Royal Society of the United Kingdom (IESR1231175).
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityAll data produced in the present study are available upon reasonable request to the authors
Comments (0)