EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge Computing

Jo, Donghyeok; Kim, Jun-Hwa; Jeon, Jihoon; Won, Chee Sun

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge Computing

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jo, Donghyeok	-
dc.contributor.author	Kim, Jun-Hwa	-
dc.contributor.author	Jeon, Jihoon	-
dc.contributor.author	Won, Chee Sun	-
dc.date.accessioned	2025-02-04T05:00:12Z	-
dc.date.available	2025-02-04T05:00:12Z	-
dc.date.issued	2025	-
dc.identifier.issn	2169-3536	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	https://scholarworks.dongguk.edu/handle/sw.dongguk/57567	-
dc.description.abstract	The ability to identify specific sounds in noisy environments can be improved by incorporating visual information through audio-visual integration, leveraging visual cues such as lip reading and sound-producing object recognition. Recent advancements in deep learning have enabled effective audio-visual sound source separation methods. Simultaneously, the increasing adoption of wearable devices capable of processing audio-visual information has further driven the demand for On-screen Sound source Separation (OSS), particularly in dynamic, egocentric scenarios. However, OSS in these scenarios remains several technical challenges, such as adapting to rapidly changing perspectives, ensuring real-time performance on resource-constrained edge devices, and developing computationally efficient learning strategies. To address these challenges, we propose EgoSep, a method designed for Egocentric On-screen Sound Source Separation(Ego-OSS). EgoSep integrates appearance and motion features from visual data with audio features extracted using a U-Net-based encoder, enabling robust separation in dynamic environments. The method is evaluated using the signal-to-noise ratio (SNR), treating on-screen sounds as signals and off-screen sounds as noise. For the experiments, we combine two public datasets: EPIC-KITCHENS, a large-scale egocentric video dataset, and ESC-50, an audio-only dataset. We simulate realistic scenarios by mixing EPIC-KITCHENS on-screen sounds with ESC-50 off-screen noise. Experimental results show that EgoSep effectively suppresses noise (i.e., off-screen sounds), improving the SNR of the test data from 3.05 dB at the input to 10.01 dB at the output. Additionally, real-time feasibility is validated on the NVIDIA Jetson Nano Developer Kit, achieving a real-time factor (RTF) of 0.17, demonstrating its practicality for wearable applications. The audio-mixed datasets and some results are available at https://donghyeok-jo.github.io/Ego-OSS.	-
dc.format.extent	10	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	IEEE	-
dc.title	EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge Computing	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/ACCESS.2025.3526757	-
dc.identifier.scopusid	2-s2.0-85214483599	-
dc.identifier.wosid	001397807300037	-
dc.identifier.bibliographicCitation	IEEE Access, v.13, pp 6387 - 6396	-
dc.citation.title	IEEE Access	-
dc.citation.volume	13	-
dc.citation.startPage	6387	-
dc.citation.endPage	6396	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordAuthor	Visualization	-
dc.subject.keywordAuthor	Feature extraction	-
dc.subject.keywordAuthor	Spectrogram	-
dc.subject.keywordAuthor	Source separation	-
dc.subject.keywordAuthor	Real-time systems	-
dc.subject.keywordAuthor	Computational modeling	-
dc.subject.keywordAuthor	Performance evaluation	-
dc.subject.keywordAuthor	Instruments	-
dc.subject.keywordAuthor	Fuses	-
dc.subject.keywordAuthor	Streaming media	-
dc.subject.keywordAuthor	Audio-visual deep learning	-
dc.subject.keywordAuthor	on-screen sound separation	-
dc.subject.keywordAuthor	edge computing	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > Department of Electronics and Electrical Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE