Adaptive undersampling and short clip-based two-stream CNN-LSTM model for surgical phase recognition on cholecystectomy videos
- Authors
- Lee, Sang-Goo; Kim, Ga-Young; Hwang, Yoo-Na; Kwon, Ji-Yean; Kim, Sung-Min
- Issue Date
- Feb-2024
- Publisher
- Elsevier BV
- Keywords
- Automated surgical phase recognition; Cholecystectomy; Endoscopic video; Short-clip-based; Two-stream CNN-LSTMs; Undersampling
- Citation
- Biomedical Signal Processing and Control, v.88, pp 1 - 9
- Pages
- 9
- Indexed
- SCIE
SCOPUS
- Journal Title
- Biomedical Signal Processing and Control
- Volume
- 88
- Start Page
- 1
- End Page
- 9
- URI
- https://scholarworks.dongguk.edu/handle/sw.dongguk/21399
- DOI
- 10.1016/j.bspc.2023.105637
- ISSN
- 1746-8094
1746-8108
- Abstract
- Surgical phase recognition is challenging due to overfitting problems caused by imbalanced data among surgical phases. We proposed an adaptive sampling rate-based undersampling method that could generate the number of each surgical phase data similarly to alleviate biased learning. To improve the performance of our method, we also introduced a two-stream CNN-LSTM model that could extract temporal information on behavioral changes between each image frame. First, we extracted a total of 40,236 short clips using an adaptive subsampling rate from the entire video. Each short clip was entered into a pre-trained GoogLeNet. The output with visual information was then immediately fed into a sequence-to-sequence LSTM model to extract temporal information of neighbor frames within a short clip. At the same time, another sequence-to-vector LSTM was used, to extract temporal information from all successive image frames to predict the final surgical phase. The proposed method was evaluated with a public dataset Cholec80. The proposed approach outperformed state-of-the-art methods, showing a high F1-score of 87.12% and an AUC of 98.00%. In addition, the F1-score deviation between all phases decreased by about 10% compared to that before applying undersampling. Experimental results confirmed that employing our proposed method could learn enrich temporal information from short clips. It outperformed the conventional one-stream CNN-LSTM architecture.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Graduate School > Department of Medical Device Business > 1. Journal Articles
- College of Life Science and Biotechnology > Department of Biomedical Engineering > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.