Automated Ulcer Severity Grading Model Accurate in Crohn Disease

capsule endoscopy
capsule endoscopy
A team of investigators assessed the accuracy of a machine learning model in grading the severity of ulcers in patients with Crohn disease.

A deep learning, machine learning model was found to have high accuracy for the detection of severe ulcerations associated with Crohn disease on capsule endoscopy images, according to findings from a computational study conducted by researchers in Israel and published in Gastrointestinal Endoscopy.

Images (N=17,640) from 49 patients were collected during capsule endoscopy for a previous study. The images included a mixture of those depicting mucosal ulcers (n=7391) and normal images (n=10,249). Each ulcer was first graded by 2 readers on the basis of the PillCam Crohn disease classification (grade 1 was defined as small and superficial ulcers, grade 2 as having an intermediate size and depth, and grade 3 as having circumferential, cobblestone, or “kissing” morphologies). Next, a consensus reading was performed by 3 readers, and the grades of 1242 images were used to train the ordinal neural network. The model was tested using 248 images.

During the first grading phase, an accordance of 31% was observed between the 2 readers who had experience reading >1000 and >10,000 capsules, respectively. Agreement between grades 1 and 2 was 40% and between 2 and 3 was 36%.

The overall consensus between the human and machine results was 67%. The accuracy of the automatic reading when comparing grade 1 to grade 3 ulcers had a 91% agreement with an area under the receiving operator curve (AUC) of 0.958, specificity of 0.91, and sensitivity of 0.91. For grades 2 and 3, the AUC was 0.939, specificity 0.73, and sensitivity 0.91. When comparing grades 1 and 2, the AUC was 0.34, specificity 0.34, and sensitivity 0.71.

Classification of individual ulcer images was compared between the human and automatic model; the same gradation was achieved for 76 grade 1 images, 8 grade 2, and 82 grade 3 images. The automatic model categorized 28 images as grade 1, 45 images as grade 2, and 9 images as grade 3; however, the human readers assigned different grades to these specific images.

A limitation of this study was the relatively low number of overall images. Furthermore, as the images did not cover the entire bowel length, the researchers were unable to incorporate endoscopic severity scores, such as the Lewis score, into their model.

The study authors concluded that their machine learning model had high accuracy for the detection of severe CD ulcers from endoscopic images. The automatic model had difficulty in distinguishing between intermediate and mild ulcers. However, distinguishing between these ulcers was similarly challenging for the 2 human readers.

Disclosure: Multiple authors declared affiliations with industry. Please refer to the original article for a full list of disclosures.

Follow @Gastro_Advisor


Yiftach B, Liran A, Shelly S, et al. Ulcer severity grading in video-capsule images of Crohn’s disease patients: an ordinal neural network solution [published online June 11, 2020]. Gastrointest Endosc. doi:10.1016/j.gie.2020.05.066