Articles

Comparison of Prompt Engineering and Fine-Tuning Strategies in Large Language Models in the Classification of Clinical Notes.

Xiaodan Zhang
Nabasmita Talukdar
Sandeep Vemulapalli, Corewell Health West
Sumyeong Ahn
Jiankun Wang
Han Meng
Sardar Mehtab Bin Murtaza
Dmitry Leshchiner
Aakash Ajay Dave
Dimitri F Joseph

Document Type

Article

Publication Date

5-31-2024

Publication Title

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Abstract

The emerging large language models (LLMs) are actively evaluated in various fields including healthcare. Most studies have focused on established benchmarks and standard parameters; however, the variation and impact of prompt engineering and fine-tuning strategies have not been fully explored. This study benchmarks GPT-3.5 Turbo, GPT-4, and Llama-7B against BERT models and medical fellows' annotations in identifying patients with metastatic cancer from discharge summaries. Results revealed that clear, concise prompts incorporating reasoning steps significantly enhanced performance. GPT-4 exhibited superior performance among all models. Notably, one-shot learning and fine-tuning provided no incremental benefit. The model's accuracy sustained even when keywords for metastatic cancer were removed or when half of the input tokens were randomly discarded. These findings underscore GPT-4's potential to substitute specialized models, such as PubMedBERT, through strategic prompt engineering, and suggest opportunities to improve open-source models, which are better suited to use in clinical settings.

Volume

2024

First Page

478

Last Page

487

Recommended Citation

Zhang X, Talukdar N, Vemulapalli S, Ahn S, Wang J, Meng H, et al Comparison of prompt engineering and fine-tuning strategies in large language models in the classification of clinical notes. AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:478-487. PMID: 38827053

ISSN

2153-4063

PubMed ID

38827053

Link to Full Text

COinS

Articles

Comparison of Prompt Engineering and Fine-Tuning Strategies in Large Language Models in the Classification of Clinical Notes.

Document Type

Publication Date

Publication Title

Abstract

Volume

First Page

Last Page

Recommended Citation

ISSN

PubMed ID

Browse

Search

Author Corner

Articles

Comparison of Prompt Engineering and Fine-Tuning Strategies in Large Language Models in the Classification of Clinical Notes.

Authors

Document Type

Publication Date

Publication Title

Abstract

Volume

First Page

Last Page

Recommended Citation

ISSN

PubMed ID

Share

Browse

Search

Author Corner