Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge Computingopen access

Authors
Jo, DonghyeokKim, Jun-HwaJeon, JihoonWon, Chee Sun
Issue Date
2025
Publisher
IEEE
Keywords
Visualization; Feature extraction; Spectrogram; Source separation; Real-time systems; Computational modeling; Performance evaluation; Instruments; Fuses; Streaming media; Audio-visual deep learning; on-screen sound separation; edge computing
Citation
IEEE Access, v.13, pp 6387 - 6396
Pages
10
Indexed
SCIE
SCOPUS
Journal Title
IEEE Access
Volume
13
Start Page
6387
End Page
6396
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/57567
DOI
10.1109/ACCESS.2025.3526757
ISSN
2169-3536
2169-3536
Abstract
The ability to identify specific sounds in noisy environments can be improved by incorporating visual information through audio-visual integration, leveraging visual cues such as lip reading and sound-producing object recognition. Recent advancements in deep learning have enabled effective audio-visual sound source separation methods. Simultaneously, the increasing adoption of wearable devices capable of processing audio-visual information has further driven the demand for On-screen Sound source Separation (OSS), particularly in dynamic, egocentric scenarios. However, OSS in these scenarios remains several technical challenges, such as adapting to rapidly changing perspectives, ensuring real-time performance on resource-constrained edge devices, and developing computationally efficient learning strategies. To address these challenges, we propose EgoSep, a method designed for Egocentric On-screen Sound Source Separation(Ego-OSS). EgoSep integrates appearance and motion features from visual data with audio features extracted using a U-Net-based encoder, enabling robust separation in dynamic environments. The method is evaluated using the signal-to-noise ratio (SNR), treating on-screen sounds as signals and off-screen sounds as noise. For the experiments, we combine two public datasets: EPIC-KITCHENS, a large-scale egocentric video dataset, and ESC-50, an audio-only dataset. We simulate realistic scenarios by mixing EPIC-KITCHENS on-screen sounds with ESC-50 off-screen noise. Experimental results show that EgoSep effectively suppresses noise (i.e., off-screen sounds), improving the SNR of the test data from 3.05 dB at the input to 10.01 dB at the output. Additionally, real-time feasibility is validated on the NVIDIA Jetson Nano Developer Kit, achieving a real-time factor (RTF) of 0.17, demonstrating its practicality for wearable applications. The audio-mixed datasets and some results are available at https://donghyeok-jo.github.io/Ego-OSS.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > Department of Electronics and Electrical Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE