Articles

Performance assessment of an artificial intelligence chatbot in clinical vitreoretinal scenarios

Michael J. Maywood, Corewell Health East Resident
Ravi Parikh
Avnish Deobhakta
Tedi Begaj, Corewell Health East

Document Type

Article

Publication Date

6-1-2024

Publication Title

Retina (Philadelphia, Pa.)

Abstract

PURPOSE: To determine how often ChatGPT is able to provide accurate and comprehensive information regarding clinical vitreoretinal scenarios. To assess the types of sources ChatGPT primarily uses and to determine whether they are hallucinated.

METHODS: This was a retrospective cross-sectional study. The authors designed 40 open-ended clinical scenarios across four main topics in vitreoretinal disease. Responses were graded on correctness and comprehensiveness by three blinded retina specialists. The primary outcome was the number of clinical scenarios that ChatGPT answered correctly and comprehensively. Secondary outcomes included theoretical harm to patients, the distribution of the type of references used by the chatbot, and the frequency of hallucinated references.

RESULTS: In June 2023, ChatGPT answered 83% of clinical scenarios (33/40) correctly but provided a comprehensive answer in only 52.5% of cases (21/40). Subgroup analysis demonstrated an average correct score of 86.7% in neovascular age-related macular degeneration, 100% in diabetic retinopathy, 76.7% in retinal vascular disease, and 70% in the surgical domain. There were six incorrect responses with one case (16.7%) of no harm, three cases (50%) of possible harm, and two cases (33.3%) of definitive harm.

CONCLUSION: ChatGPT correctly answered more than 80% of complex open-ended vitreoretinal clinical scenarios, with a reduced capability to provide a comprehensive response.

Volume

Issue

First Page

954

Last Page

964

Recommended Citation

Maywood MJ, Parikh R, Deobhakta A, Begaj T. Performance assessment of an artificial intelligence chatbot in clinical vitreoretinal scenarios. Retina. 2024 Jun 1;44(6):954-964. doi: 10.1097/IAE.0000000000004053. PMID: 38271674.

DOI

10.1097/IAE.0000000000004053

ISSN

1539-2864

PubMed ID

38271674

Link to Full Text

COinS

Articles

Performance assessment of an artificial intelligence chatbot in clinical vitreoretinal scenarios

Document Type

Publication Date

Publication Title

Abstract

Volume

Issue

First Page

Last Page

Recommended Citation

DOI

ISSN

PubMed ID

Browse

Search

Author Corner

Articles

Performance assessment of an artificial intelligence chatbot in clinical vitreoretinal scenarios

Authors

Document Type

Publication Date

Publication Title

Abstract

Volume

Issue

First Page

Last Page

Recommended Citation

DOI

ISSN

PubMed ID

Share

Browse

Search

Author Corner