Evaluation of a digital ophthalmologist app built by GPT4-Vision

Abstract

Backgrounds: GPT4-Vision (GPT4V) has generated great interest across various fields, while its performance in ocular multimodal images is still unknown. This study aims to evaluate the capabilities of a GPT4V-based chatbot in addressing queries related to ocular multimodal images. Methods: A digital ophthalmologist app was built based on GPT4V. The evaluation dataset comprised various ocular imaging modalities: slit-lamp, scanning laser ophthalmoscopy (SLO), fundus photography of the posterior pole (FPP), optical coherence tomography (OCT), fundus fluorescein angiography (FFA), and ocular ultrasound (OUS). Each modality included images representing 5 common and 5 rare diseases. The chatbot was presented with ten questions per image, focusing on examination identification, lesion detection, diagnosis, decision support, and the repeatability of diagnosis. The responses of GPT4V were evaluated based on accuracy, usability, and safety. Results: There was a substantial agreement among three ophthalmologists. Out of 600 responses, 30.5% were accurate, 22.8% of 540 responses were highly usable, and 55.5% of 540 responses were considered safe by ophthalmologists. The chatbot excelled in interpreting slit-lamp images, with 42.0%, 42.2%, and 68.5% of the responses being accurate, highly usable, and no harm, respectively. However, its performance was notably weaker in FPP images, with only 13.7%, 3.7%, and 38.5% in the same categories. It correctly identified 95.6% of the imaging modalities. For lesion identification, diagnosis, and decision support, the chatbot's accuracy was 25.6%, 16.1%, and 24.0%, respectively. The average proportions of correct answers, highly usable, and no harm for GPT4V in common diseases were 37.9%, 30.5%, and 60.1%, respectively. These proportions were all higher compared to those in rare diseases, which were 23.2% (P<0.001), 15.2% (P<0.001), and 51.1% (P=0.032), respectively. The overall repeatability of GPT4-V in diagnosing ocular images was 63% (38/60). Conclusion: Currently, GPT4V lacks the reliability required for clinical decision-making and patient consultation in ophthalmology. Ongoing refinement and testing are essential for improving the efficacy of large language models in medical applications.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was supported by the Global STEM Professorship Scheme (P0046113).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study adheres to the tenets of the Declaration of Helsinki. The ethics committee of the Zhongshan Ophthalmic Center approved the study (No.2021KYPJ164-3) and individual consent for this retrospective analysis was waived.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The data of this study are available from the corresponding author on reasonable request.

Comments (0)

No login
gif