Video Analysis of Small Bowel Capsule Endoscopy Using a Transformer Network

Citations

WEB OF SCIENCE

3
Citations

SCOPUS

4

초록

Although wireless capsule endoscopy (WCE) detects small bowel diseases effectively, it has some limitations. For example, the reading process can be time consuming due to the numerous images generated per case and the lesion detection accuracy may rely on the operators’ skills and experiences. Hence, many researchers have recently developed deep-learning-based methods to address these limitations. However, they tend to select only a portion of the images from a given WCE video and analyze each image individually. In this study, we note that more information can be extracted from the unused frames and the temporal relations of sequential frames. Specifically, to increase the accuracy of lesion detection without depending on experts’ frame selection skills, we suggest using whole video frames as the input to the deep learning system. Thus, we propose a new Transformer-architecture-based neural encoder that takes the entire video as the input, exploiting the power of the Transformer architecture to extract long-term global correlation within and between the input frames. Subsequently, we can capture the temporal context of the input frames and the attentional features within a frame. Tests on benchmark datasets of four WCE videos showed 95.1% sensitivity and 83.4% specificity. These results may significantly advance automated lesion detection techniques for WCE images. © 2023 by the authors.

키워드

artificial intelligencecapsule endoscopytransformervideo-analysis
제목
Video Analysis of Small Bowel Capsule Endoscopy Using a Transformer Network
저자
Oh, SangYupOh, DongJunKim, DongminSong, WoohyukHwang, YoungbaeCho, NamikLim, Yun Jeong
DOI
10.3390/diagnostics13193133
발행일
2023-10
유형
Article
저널명
Diagnostics
13
19
페이지
1 ~ 12