SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wildopen access
- Authors
- Moon, Jiyong; Jang, Hyeryung; Park, Seongsik
- Issue Date
- Apr-2025
- Publisher
- IEEE
- Keywords
- Contrastive learning; facial expression recognition; masked image modeling; self-supervised learning
- Citation
- IEEE Transactions on Affective Computing, v.16, no.2, pp 799 - 813
- Pages
- 15
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEEE Transactions on Affective Computing
- Volume
- 16
- Number
- 2
- Start Page
- 799
- End Page
- 813
- URI
- https://scholarworks.dongguk.edu/handle/sw.dongguk/26414
- DOI
- 10.1109/TAFFC.2024.3470980
- ISSN
- 2371-9850
1949-3045
- Abstract
- Facial expression recognition in the wild (FER-W) entails classifying facial emotions in natural environments. The major challenges in FER-W stem from the complexity and ambiguity of facial images, making it difficult to curate a large-scale labeled dataset for training. Additionally, the subtle differences in emotions often reside in the fine-grained details of local facial landmarks, demanding innovative solutions to capture these crucial features efficiently. To address these issues, we employ two distinct self-supervised methods. First, we adopt a contrastive learning method to capture generalized global representations, enabling the model to understand the semantic context of facial expressions without relying on labeled data. Simultaneously, we leverage masked image modeling to focus on embedding fine-grained, local facial landmark information at the patch-level. We introduce a novel module called FaceMAE, which aims to reconstruct the masked facial patches. The semantic masking scheme is designed to preserve highly activated feature activations, allowing the encoding of crucial details of unmasked facial landmarks and their relationships within the broader facial context at the patch-level. It finally guides the backbone network to calibrate the learned global features to be attentive to facial landmarks. Our proposed method, called Simple Facial Landmark Encoding (SimFLE), significantly outperforms supervised baseline and other self-supervised methods in terms of facial landmark localization and overall performance, as demonstrated through extensive experiments across several FER-W benchmarks. © 2010-2012 IEEE.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.