SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild

Moon, Jiyong; Jang, Hyeryung; Park, Seongsik

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wildopen access

Authors: Moon, Jiyong; Jang, Hyeryung; Park, Seongsik

Issue Date: Apr-2025

Publisher: IEEE

Keywords: Contrastive learning; facial expression recognition; masked image modeling; self-supervised learning

Citation: IEEE Transactions on Affective Computing, v.16, no.2, pp 799 - 813

Pages: 15

Indexed: SCIE
SCOPUS

Journal Title: IEEE Transactions on Affective Computing

Volume: 16

Number: 2

Start Page: 799

End Page: 813

URI: https://scholarworks.dongguk.edu/handle/sw.dongguk/26414

DOI: 10.1109/TAFFC.2024.3470980

ISSN: 2371-9850
1949-3045

Abstract: Facial expression recognition in the wild (FER-W) entails classifying facial emotions in natural environments. The major challenges in FER-W stem from the complexity and ambiguity of facial images, making it difficult to curate a large-scale labeled dataset for training. Additionally, the subtle differences in emotions often reside in the fine-grained details of local facial landmarks, demanding innovative solutions to capture these crucial features efficiently. To address these issues, we employ two distinct self-supervised methods. First, we adopt a contrastive learning method to capture generalized global representations, enabling the model to understand the semantic context of facial expressions without relying on labeled data. Simultaneously, we leverage masked image modeling to focus on embedding fine-grained, local facial landmark information at the patch-level. We introduce a novel module called FaceMAE, which aims to reconstruct the masked facial patches. The semantic masking scheme is designed to preserve highly activated feature activations, allowing the encoding of crucial details of unmasked facial landmarks and their relationships within the broader facial context at the patch-level. It finally guides the backbone network to calibrate the learned global features to be attentive to facial landmarks. Our proposed method, called Simple Facial Landmark Encoding (SimFLE), significantly outperforms supervised baseline and other self-supervised methods in terms of facial landmark localization and overall performance, as demonstrated through extensive experiments across several FER-W benchmarks. © 2010-2012 IEEE.

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Jang, Hye Ryung photo

Jang, Hye Ryung: College of Advanced Convergence Engineering (Department of Computer Science and Artificial Intelligence)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE