Cited 0 time in
Audio-Visual Action Recognition Using Transformer Fusion Network
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kim, Jun-Hwa | - |
| dc.contributor.author | Won, Chee Sun | - |
| dc.date.accessioned | 2024-08-08T08:01:05Z | - |
| dc.date.available | 2024-08-08T08:01:05Z | - |
| dc.date.issued | 2024-02 | - |
| dc.identifier.issn | 2076-3417 | - |
| dc.identifier.issn | 2076-3417 | - |
| dc.identifier.uri | https://scholarworks.dongguk.edu/handle/sw.dongguk/20028 | - |
| dc.description.abstract | Our approach to action recognition is grounded in the intrinsic coexistence of and complementary relationship between audio and visual information in videos. Going beyond the traditional emphasis on visual features, we propose a transformer-based network that integrates both audio and visual data as inputs. This network is designed to accept and process spatial, temporal, and audio modalities. Features from each modality are extracted using a single Swin Transformer, originally devised for still images. Subsequently, these extracted features from spatial, temporal, and audio data are adeptly combined using a novel modal fusion module (MFM). Our transformer-based network effectively fuses these three modalities, resulting in a robust solution for action recognition. | - |
| dc.format.extent | 13 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | MDPI | - |
| dc.title | Audio-Visual Action Recognition Using Transformer Fusion Network | - |
| dc.type | Article | - |
| dc.publisher.location | 스위스 | - |
| dc.identifier.doi | 10.3390/app14031190 | - |
| dc.identifier.scopusid | 2-s2.0-105019297722 | - |
| dc.identifier.wosid | 001160022100001 | - |
| dc.identifier.bibliographicCitation | Applied Sciences, v.14, no.3, pp 1 - 13 | - |
| dc.citation.title | Applied Sciences | - |
| dc.citation.volume | 14 | - |
| dc.citation.number | 3 | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 13 | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Chemistry | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalResearchArea | Materials Science | - |
| dc.relation.journalResearchArea | Physics | - |
| dc.relation.journalWebOfScienceCategory | Chemistry, Multidisciplinary | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Multidisciplinary | - |
| dc.relation.journalWebOfScienceCategory | Materials Science, Multidisciplinary | - |
| dc.relation.journalWebOfScienceCategory | Physics, Applied | - |
| dc.subject.keywordAuthor | action recognition | - |
| dc.subject.keywordAuthor | multi modal | - |
| dc.subject.keywordAuthor | deep learning | - |
| dc.subject.keywordAuthor | video | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114
Copyright(c) 2023 DONGGUK UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
