Advanced Facial Analysis in Multi-Modal Data with Cascaded Cross-Attention based Transformer

Kim, Jun-Hwa; Kim, Namho; Hong, Minsoo; Won, Chee Sun

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Advanced Facial Analysis in Multi-Modal Data with Cascaded Cross-Attention based Transformer

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Jun-Hwa	-
dc.contributor.author	Kim, Namho	-
dc.contributor.author	Hong, Minsoo	-
dc.contributor.author	Won, Chee Sun	-
dc.date.accessioned	2024-11-11T08:00:09Z	-
dc.date.available	2024-11-11T08:00:09Z	-
dc.date.issued	2024-09	-
dc.identifier.issn	2160-7508	-
dc.identifier.issn	2160-7516	-
dc.identifier.uri	https://scholarworks.dongguk.edu/handle/sw.dongguk/56181	-
dc.description.abstract	One of the most crucial elements in deeply understanding humans on a psychological level is manifested through facial expressions. The analysis of human behavior can be informed by their facial expressions, making it essential to employ indicators such as expression (EXPR), valence-arousal (VA), and action units (AU). In this paper, we introduce the method proposed in the Challenge of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW) at CVPR 2024. Our proposed method utilizes the multi-modal Aff-Wild2 dataset, which is split into visual and audio modalities. For the visual data, we extract features using the SimMIM model that was pre-trained on a diverse set of facial expression data. For the audio data, we extract features using the Wav2Vec model. Then, to fuse the extracted visual and audio features, we proposed a cascaded cross-attention mechanism in a transformer. Our approach achieved average F1 scores of 0.4652 and 0.3005 on the AU and the EXPR tracks, respectively, and an average Concordance Correlation Coefficient (CCC) of 0.5077, outperforming the baseline performance on all tracks of the ABAW6 competition. Our approach placed 5th, 6th, and 7th on the AU, the EXPR, and the VA tracks, respectively. The code used in the 6th ABAW competition is available at https://github.com/namho-96/ABAW2024. © 2024 IEEE.	-
dc.format.extent	8	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	IEEE	-
dc.title	Advanced Facial Analysis in Multi-Modal Data with Cascaded Cross-Attention based Transformer	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/CVPRW63382.2024.00784	-
dc.identifier.scopusid	2-s2.0-85206483361	-
dc.identifier.wosid	001327781708005	-
dc.identifier.bibliographicCitation	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 7870 - 7877	-
dc.citation.title	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)	-
dc.citation.startPage	7870	-
dc.citation.endPage	7877	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Interdisciplinary Applications	-
dc.subject.keywordAuthor	ABAW	-
dc.subject.keywordAuthor	Cross-attention	-
dc.subject.keywordAuthor	Facial Analysis	-
dc.subject.keywordAuthor	Transformer	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > Department of Electronics and Electrical Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE