Cited 0 time in
EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge Computing
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Jo, Donghyeok | - |
| dc.contributor.author | Kim, Jun-Hwa | - |
| dc.contributor.author | Jeon, Jihoon | - |
| dc.contributor.author | Won, Chee Sun | - |
| dc.date.accessioned | 2025-02-04T05:00:12Z | - |
| dc.date.available | 2025-02-04T05:00:12Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.issn | 2169-3536 | - |
| dc.identifier.issn | 2169-3536 | - |
| dc.identifier.uri | https://scholarworks.dongguk.edu/handle/sw.dongguk/57567 | - |
| dc.description.abstract | The ability to identify specific sounds in noisy environments can be improved by incorporating visual information through audio-visual integration, leveraging visual cues such as lip reading and sound-producing object recognition. Recent advancements in deep learning have enabled effective audio-visual sound source separation methods. Simultaneously, the increasing adoption of wearable devices capable of processing audio-visual information has further driven the demand for On-screen Sound source Separation (OSS), particularly in dynamic, egocentric scenarios. However, OSS in these scenarios remains several technical challenges, such as adapting to rapidly changing perspectives, ensuring real-time performance on resource-constrained edge devices, and developing computationally efficient learning strategies. To address these challenges, we propose EgoSep, a method designed for Egocentric On-screen Sound Source Separation(Ego-OSS). EgoSep integrates appearance and motion features from visual data with audio features extracted using a U-Net-based encoder, enabling robust separation in dynamic environments. The method is evaluated using the signal-to-noise ratio (SNR), treating on-screen sounds as signals and off-screen sounds as noise. For the experiments, we combine two public datasets: EPIC-KITCHENS, a large-scale egocentric video dataset, and ESC-50, an audio-only dataset. We simulate realistic scenarios by mixing EPIC-KITCHENS on-screen sounds with ESC-50 off-screen noise. Experimental results show that EgoSep effectively suppresses noise (i.e., off-screen sounds), improving the SNR of the test data from 3.05 dB at the input to 10.01 dB at the output. Additionally, real-time feasibility is validated on the NVIDIA Jetson Nano Developer Kit, achieving a real-time factor (RTF) of 0.17, demonstrating its practicality for wearable applications. The audio-mixed datasets and some results are available at https://donghyeok-jo.github.io/Ego-OSS. | - |
| dc.format.extent | 10 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | IEEE | - |
| dc.title | EgoSep: Egocentric On-Screen Sound Source Separation for Real-Time Edge Computing | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1109/ACCESS.2025.3526757 | - |
| dc.identifier.scopusid | 2-s2.0-85214483599 | - |
| dc.identifier.wosid | 001397807300037 | - |
| dc.identifier.bibliographicCitation | IEEE Access, v.13, pp 6387 - 6396 | - |
| dc.citation.title | IEEE Access | - |
| dc.citation.volume | 13 | - |
| dc.citation.startPage | 6387 | - |
| dc.citation.endPage | 6396 | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalResearchArea | Telecommunications | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
| dc.relation.journalWebOfScienceCategory | Telecommunications | - |
| dc.subject.keywordAuthor | Visualization | - |
| dc.subject.keywordAuthor | Feature extraction | - |
| dc.subject.keywordAuthor | Spectrogram | - |
| dc.subject.keywordAuthor | Source separation | - |
| dc.subject.keywordAuthor | Real-time systems | - |
| dc.subject.keywordAuthor | Computational modeling | - |
| dc.subject.keywordAuthor | Performance evaluation | - |
| dc.subject.keywordAuthor | Instruments | - |
| dc.subject.keywordAuthor | Fuses | - |
| dc.subject.keywordAuthor | Streaming media | - |
| dc.subject.keywordAuthor | Audio-visual deep learning | - |
| dc.subject.keywordAuthor | on-screen sound separation | - |
| dc.subject.keywordAuthor | edge computing | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114
Copyright(c) 2023 DONGGUK UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
