Variability in Plus Disease Diagnosis using Single and Serial Images

Document Type


Publication Date


Publication Title

Ophthalmology Retina


PURPOSE: To assess changes in retinopathy of prematurity (ROP) diagnosis in single and serial retinal images.

DESIGN: Cohort study.

PARTICIPANTS: Cases of ROP recruited from the Imaging and Informatics in Retinopathy of Prematurity (i-ROP) consortium evaluated by 7 graders.

METHODS: Seven ophthalmologists reviewed both single and 3 consecutive serial retinal images from 15 cases with ROP, and severity was assigned as plus, preplus, or none. Imaging data were acquired during routine ROP screening from 2011 to 2015, and a reference standard diagnosis was established for each image. A secondary analysis was performed using the i-ROP deep learning system to assign a vascular severity score (VSS) to each image, ranging from 1 to 9, with 9 being the most severe disease. This score has been previously demonstrated to correlate with the International Classification of ROP. Mean plus disease severity was calculated by averaging 14 labels per image in serial and single images to decrease noise.

MAIN OUTCOME MEASURES: Grading severity of ROP as defined by plus, preplus, or no ROP.

RESULTS: Assessment of serial retinal images changed the grading severity for > 50% of the graders, although there was wide variability. Cohen's kappa ranged from 0.29 to 1.0, which showed a wide range of agreement from slight to perfect by each grader. Changes in the grading of serial retinal images were noted more commonly in cases of preplus disease. The mean severity in cases with a diagnosis of plus disease and no disease did not change between single and serial images. The ROP VSS demonstrated good correlation with the range of expert classifications of plus disease and overall agreement with the mode class (P = 0.001). The VSS correlated with mean plus disease severity by expert diagnosis (correlation coefficient, 0.89). The more aggressive graders tended to be influenced by serial images to increase the severity of their grading. The VSS also demonstrated agreement with disease progression across serial images, which progressed to preplus and plus disease.

CONCLUSIONS: Clinicians demonstrated variability in ROP diagnosis when presented with both single and serial images. The use of deep learning as a quantitative assessment of plus disease has the potential to standardize ROP diagnosis and treatment.





First Page


Last Page






PubMed ID