EdgeV-SE: Self-Reflective Fine-Tuning Framework for Edge-Deployable Vision-Language Modelsopen access
- Authors
- Jeon, Yoonmo; Lee, Seunghun; Kim, Woongsup
- Issue Date
- Jan-2026
- Publisher
- MDPI
- Keywords
- Vision-Language Model (VLM); edge computing; self-reflective learning; consistency regularization; mutual learning; satellite IoT; NVIDIA Jetson; disaster analysis
- Citation
- Applied Sciences, v.16, no.2, pp 1 - 31
- Pages
- 31
- Indexed
- SCIE
SCOPUS
- Journal Title
- Applied Sciences
- Volume
- 16
- Number
- 2
- Start Page
- 1
- End Page
- 31
- URI
- https://scholarworks.dongguk.edu/handle/sw.dongguk/63570
- DOI
- 10.3390/app16020818
- ISSN
- 2076-3417
2076-3417
- Abstract
- Featured Application The proposed framework enables the deployment of robust Vision-Language Models on resource-constrained off-the-shelf edge devices, such as the NVIDIA Jetson series. Its primary application is real-time disaster damage assessment using satellite imagery in communication-denied environments, facilitating immediate decision-making for first responders.Abstract The deployment of Vision-Language Models (VLMs) in Satellite IoT scenarios is critical for real-time disaster assessment but is often hindered by the substantial memory and compute requirements of state-of-the-art models. While parameter-efficient fine-tuning (PEFT) enables adaptation, with minimal computational overhead, standard supervised methods often fail to ensure robustness and reliability on resource-constrained edge devices. To address this, we propose EdgeV-SE, a self-reflective fine-tuning framework that significantly enhances the performance of VLM without introducing any inference-time overhead. Our framework incorporates an uncertainty-aware self-reflection mechanism with asymmetric dual pathways: a generative linguistic pathway and an auxiliary discriminative visual pathway. By estimating uncertainty from the linguistic pathway using a log-likelihood margin between class verbalizers, EdgeV-SE identifies ambiguous samples and refines its decision boundaries via consistency regularization and cross-pathway mutual learning. Experimental results on hurricane damage assessment demonstrate that our approach improves image classification accuracy, enhances image-text semantic alignment, and achieves superior caption quality. Notably, our work achieves these gains while maintaining practical deployment on a commercial off-the-shelf edge device such as NVIDIA Jetson Orin Nano, preserving the inference latency and memory footprint. Overall, our work contributes a unified self-reflective fine-tuning framework that improves robustness, calibration, and deployability of VLMs on edge devices.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Engineering > Department of Information and Communication Engineering > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.