Abstract

Artificial intelligence enabled automated diagnosis and grading of ulcerative colitis endoscopy images

Sci Rep. 2022 Feb 17;12(1):2748. doi: 10.1038/s41598-022-06726-2.

Reed T Sutton 1, Osmar R Zai Ane 2 3, Randolph Goebel 2 3, Daniel C Baumgart 4 5

 
     

Author information

1Division of Gastroenterology, University of Alberta, 130 University Campus, Edmonton, AB, T6G 2X8, Canada.

2Department of Computing Science, University of Alberta, Edmonton, AB, Canada.

3Alberta Machine Intelligence Institute, University of Alberta, Edmonton, AB, Canada.

4Division of Gastroenterology, University of Alberta, 130 University Campus, Edmonton, AB, T6G 2X8, Canada. baumgart@ualberta.ca.

5Department of Computing Science, University of Alberta, Edmonton, AB, Canada. baumgart@ualberta.ca.

Abstract

Endoscopic evaluation to reliably grade disease activity, detect complications including cancer and verification of mucosal healing are paramount in the care of patients with ulcerative colitis (UC); but this evaluation is hampered by substantial intra- and interobserver variability. Recently, artificial intelligence methodologies have been proposed to facilitate more objective, reproducible endoscopic assessment. In a first step, we compared how well several deep learning convolutional neural network architectures (CNNs) applied to a diverse subset of 8000 labeled endoscopic still images derived from HyperKvasir, the largest multi-class image and video dataset from the gastrointestinal tract available today. The HyperKvasir dataset includes 110,079 images and 374 videos and could (1) accurately distinguish UC from non-UC pathologies, and (2) inform the Mayo score of endoscopic disease severity. We grouped 851 UC images labeled with a Mayo score of 0-3, into an inactive/mild (236) and moderate/severe (604) dichotomy. Weights were initialized with ImageNet, and Grid Search was used to identify the best hyperparameters using fivefold cross-validation. The best accuracy (87.50%) and Area Under the Curve (AUC) (0.90) was achieved using the DenseNet121 architecture, compared to 72.02% and 0.50 by predicting the majority class ('no skill' model). Finally, we used Gradient-weighted Class Activation Maps (Grad-CAM) to improve visual interpretation of the model and take an explainable artificial intelligence approach (XAI).

© Copyright 2013-2025 GI Health Foundation. All rights reserved.
This site is maintained as an educational resource for US healthcare providers only. Use of this website is governed by the GIHF terms of use and privacy statement.