Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of Latent Semantic Indexing and Generative Pretrained Transformer (GPT) models.
Document Type
Article
Publication Date
10-10-2024
Publication Title
BMC medical informatics and decision making [electronic resource]
Abstract
BACKGROUND: Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR.
METHODS: Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes.
RESULTS: LSI retrieved patients according to 15 SBDH domains, with an overall average PPV
CONCLUSIONS: These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.
Volume
24
Issue
1
First Page
296
Recommended Citation
Roy S, Morrell S, Zhao L, Homayouni R. Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of latent semantic indexing and generative pretrained transformer (gpt) models. BMC Med Inform Decis Mak. 2024 Oct 10;24(1):296. doi: 10.1186/s12911-024-02705-x. PMID: 39390479
DOI
10.1186/s12911-024-02705-x
ISSN
1472-6947
PubMed ID
39390479