Deep Learning For Contour Quality Assurance on RTOG 0933
Document Type
Conference Proceeding
Publication Date
11-1-2022
Publication Title
International Journal of Radiation Oncology Biology Physics
Abstract
Purpose/Objective(s)
To evaluate a CT-based deep learning (DL) hippocampal segmentation model, trained from a single-institutional dataset, and tested on the RTOG 0933 dataset and explore its potential for multi-institutional contour quality assurance (QA).Materials/Methods
An attention-gated 3D ResNet deep learning (DL) model was trained on the task of semantic segmentation of the left (L) and right (R) hippocampus on a 390-patient Gamma Knife single-institution cohort using a ground truth of institutional observers (IOs). The model was then evaluated on the RTOG 0933 dataset by comparing to both the treating physician (TP) contours and blinded IO contours via Dice coefficient and Hausdorff distance (HD). The sensitivity and specificity of the DL model to capture discrepancies from the TP contour compared to the IO contour (a surrogate for central review contours) were assessed. Hippocampal avoidance whole brain radiotherapy plans were generated. The ability of DL and IO to identify unacceptable deviations of TP plans (per RTOG 0933 defined constraints) was assessed via Wilcoxon Signed-rank (WSR) and Cochran's Q.Results
The DL model showed significantly greater agreement with IO contours compared to TP contours (DL:IO L/R Dice 73%/74%, HD 4.86/4.74 mm; DL:TP L/R Dice 62%/65%, HD 7.23/6.94 mm, all p<0.001). Using the RTOG protocol-defined passing metric of HD<7 mm as an agreement threshold, the DL model achieved an AUC L/R 0.80/0.79 in ability to discriminate TP contours from IO contours, with a false-negative rate of 17.2%/20.5%. WSR revealed that, when limited to subjects meeting the HD<7 mm agreement threshold, DL and IO chose populations that were not dosimetrically different from TP. When limited to subjects failing HD<7 mm, DL and IO chose populations with significant differences in hippocampal maximum doses (WSR=18.0, p=0.001; WSR=7.0, p=0.002) and PTV D98% (WSR=61.5, p=0.033; WSR=15, p=0.002) from TP. Cochran's Q showed no statistical difference between DL and IO in the rate of identification of RTOG-defined acceptable contours from TP (34.33, p=0.311).Conclusion
Our study demonstrates the feasibility of using a single-institutional DL model to perform contour QA on a multi-institutional trial for the task of hippocampal segmentation. The DL model was capable of discriminating contours generated by treating physicians from a central reviewer and was able to identify a dosimetrically comparable population to the central reviewer. Further study is needed to assess optimal quality metrics and the generalizability of DL for contour QA.Volume
114
Issue
3 Suppl
First Page
e121
Last Page
e122
Recommended Citation
Mumaw D, Porter E, Vu CC, Fuentes P, Sala IM, Myziuk NK, et al. [Siddiqui ZA, Guerrero TM]. Deep learning for contour quality assurance on RTOG 0933. Int J Radiat Oncol Biol Phys. 2022 Nov 1;114(3 Suppl):e121-e122. doi:10.1016/j.ijrobp.2022.07.941.
DOI
10.1016/j.ijrobp.2022.07.941
COinS
Comments
American Society for Radiation Oncology (ASTRO) Annual Meeting, October 23-26, 2022, San Antonio, TX.