Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge Computing

Full metadata record
DC Field Value Language
dc.contributor.authorJo, Donghyeok-
dc.contributor.authorKim, Jun-Hwa-
dc.contributor.authorJeon, Jihoon-
dc.contributor.authorWon, Chee Sun-
dc.date.accessioned2025-02-04T05:00:12Z-
dc.date.available2025-02-04T05:00:12Z-
dc.date.issued2025-
dc.identifier.issn2169-3536-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://scholarworks.dongguk.edu/handle/sw.dongguk/57567-
dc.description.abstractThe ability to identify specific sounds in noisy environments can be improved by incorporating visual information through audio-visual integration, leveraging visual cues such as lip reading and sound-producing object recognition. Recent advancements in deep learning have enabled effective audio-visual sound source separation methods. Simultaneously, the increasing adoption of wearable devices capable of processing audio-visual information has further driven the demand for On-screen Sound source Separation (OSS), particularly in dynamic, egocentric scenarios. However, OSS in these scenarios remains several technical challenges, such as adapting to rapidly changing perspectives, ensuring real-time performance on resource-constrained edge devices, and developing computationally efficient learning strategies. To address these challenges, we propose EgoSep, a method designed for Egocentric On-screen Sound Source Separation(Ego-OSS). EgoSep integrates appearance and motion features from visual data with audio features extracted using a U-Net-based encoder, enabling robust separation in dynamic environments. The method is evaluated using the signal-to-noise ratio (SNR), treating on-screen sounds as signals and off-screen sounds as noise. For the experiments, we combine two public datasets: EPIC-KITCHENS, a large-scale egocentric video dataset, and ESC-50, an audio-only dataset. We simulate realistic scenarios by mixing EPIC-KITCHENS on-screen sounds with ESC-50 off-screen noise. Experimental results show that EgoSep effectively suppresses noise (i.e., off-screen sounds), improving the SNR of the test data from 3.05 dB at the input to 10.01 dB at the output. Additionally, real-time feasibility is validated on the NVIDIA Jetson Nano Developer Kit, achieving a real-time factor (RTF) of 0.17, demonstrating its practicality for wearable applications. The audio-mixed datasets and some results are available at https://donghyeok-jo.github.io/Ego-OSS.-
dc.format.extent10-
dc.language영어-
dc.language.isoENG-
dc.publisherIEEE-
dc.titleEgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge Computing-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1109/ACCESS.2025.3526757-
dc.identifier.scopusid2-s2.0-85214483599-
dc.identifier.wosid001397807300037-
dc.identifier.bibliographicCitationIEEE Access, v.13, pp 6387 - 6396-
dc.citation.titleIEEE Access-
dc.citation.volume13-
dc.citation.startPage6387-
dc.citation.endPage6396-
dc.type.docTypeArticle-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.subject.keywordAuthorVisualization-
dc.subject.keywordAuthorFeature extraction-
dc.subject.keywordAuthorSpectrogram-
dc.subject.keywordAuthorSource separation-
dc.subject.keywordAuthorReal-time systems-
dc.subject.keywordAuthorComputational modeling-
dc.subject.keywordAuthorPerformance evaluation-
dc.subject.keywordAuthorInstruments-
dc.subject.keywordAuthorFuses-
dc.subject.keywordAuthorStreaming media-
dc.subject.keywordAuthorAudio-visual deep learning-
dc.subject.keywordAuthoron-screen sound separation-
dc.subject.keywordAuthoredge computing-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > Department of Electronics and Electrical Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE