Semantic-Guided Spatial and Temporal Fusion Framework for Enhancing Monocular Video Depth Estimation
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Despite advancements in deep learning-based Monocular Depth Estimation (MDE), applying these models to video sequences remains challenging due to geometric ambiguities in texture-less regions and temporal instability caused by independent per-frame inference. To address these limitations, we propose STF-Depth, a novel post-processing framework that enhances depth quality by logically fusing heterogeneous information-geometric, semantic, and panoptic-without requiring additional retraining. Our approach introduces a robust RANSAC-based Vanishing Point Estimation to guide Dynamic Depth Gradient Correction for background separation, alongside Adaptive Instance Re-ordering to clarify occlusion relationships. Experimental results on the KITTI, NYU Depth V2, and TartanAir datasets demonstrate that STF-Depth functions as a universal plug-and-play module. Notably, it achieved a 25.7% reduction in Absolute Relative error (AbsRel) and significantly enhanced temporal consistency compared to state-of-the-art backbone models. These findings confirm the framework's practicality for real-world applications requiring geometric precision and video stability, such as autonomous driving, robotics, and augmented reality (AR).

키워드

monocular video depth estimationheterogeneous information fusiontemporal consistencysemantic and panoptic segmentationvanishing point estimation
제목
Semantic-Guided Spatial and Temporal Fusion Framework for Enhancing Monocular Video Depth Estimation
저자
Kim, HyunsuLee, YeongseopKo, HyunseongJeong, JunhoSon, Yunsik
DOI
10.3390/app16010212
발행일
2025-12
유형
Article
저널명
Applied Sciences
16
1
페이지
1 ~ 26