Machine Learning Methods to Address Confounding in Sepsis Mortality Rate

Document Type

Conference Proceeding

Publication Date


Publication Title

American Journal of Respiratory and Critical Care Medicine


Background/Rationale: Sepsis is a leading cause of hospitalization and death. To address sepsis morbidity and mortality, the Michigan Hospital Medicine Safety Consortium launched a sepsis initiative in 2021 (HMS-Sepsis). We sought to develop and validate a sepsis mortality risk model to facilitate benchmarking sepsis mortality across HMS-Sepsis hospitals. We evaluated the performance of machine-learning (ML) vs logistic regression (LR) models for predicting 90-day mortality. Methods: The Hospital Medicine Safety Consortium (HMS) is a Collaborative Quality Initiative sponsored by BlueCross Blue Shield of Michigan. Professional abstractors at each hospital abstract community-onset sepsis hospitalizations into a central registry to facilitate performance evaluation. Sepsis hospitalizations are identified via a two-step process that incorporates (1) diagnostic coding for infection and (2) clinical criteria indicative of sepsis per the CDC surveillance definition. Using all hospitalizations in the HMS-Sepsis registry (11/2020-09/2022), we fit 90-day mortality models using several ML methods (Gaussian Naïve Bayes, Random Forest (RF), Support Vector Machine, and adeep neural network) and compared performance to a LR model. Candidate predictors included demographic, clinical, and physiologic variables available within 6 hours of presentation, collected from both structured and unstructured data. We assessed model performance using area under the curve(AUC) and Brier score, and quantified variable importance in the best performing model. Results: There were 5,303 sepsis hospitalizations from 31 hospitals in the HMS-Sepsis Registry during the study period. 90-day mortality was 27%. Compared with the LR model (AUC: 0.764, 95%CI 0.749-0.778), all ML models showed superior discriminative ability, with RF being highest (AUC: 0.902, 95%CI 0.882-0.921). All ML models showed higher sensitivity, specificity, positive predictive value, and negative predictive value compared with the reference LR model (P<.001). ML models also had lower Brier scores (indicating greater accuracy). The RF performed best across all criteria. The variables in the RF model with greatest prognostic significance as measured by mean accuracy decrease were: age, a weighted comorbidity index, recent prior hospitalization, creatinine, PaO2/FiO2 ratio, and pre-existing functional limitation (Figure1). Conclusion: In this multi-hospital cohort of >5,000 sepsis hospitalizations,a RF model provided the most accurate prediction of 90-day mortality. While traditional statistical methods, such as LR, are commonly used for hospital benchmarking, ensemble methods such as RF may provide more accurate risk-adjustment. Furthermore, while ML models are often viewed as a “black box”, the use of well-established clinical predictors and the presentation of variable importancedata may increase model credibility to end-users.





First Page



American Thoracic Society International Conference, May 19-24, 2023, Washington, D.C.