Detailed Information

Cited 1 time in webofscience Cited 1 time in scopus
Metadata Downloads

Multi-View Masked Autoencoder for General Image Representationopen access

Authors
Ji, SeungbinHan, SangkwonRhee, Jongtae
Issue Date
Nov-2023
Publisher
MDPI
Keywords
contrastive learning; deep learning; image representation learning; masked image modeling; self-supervised learning
Citation
Applied Sciences, v.13, no.22, pp 1 - 15
Pages
15
Indexed
SCIE
SCOPUS
Journal Title
Applied Sciences
Volume
13
Number
22
Start Page
1
End Page
15
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/21918
DOI
10.3390/app132212413
ISSN
2076-3417
2076-3417
Abstract
Self-supervised learning is a method that learns general representation from unlabeled data. Masked image modeling (MIM), one of the generative self-supervised learning methods, has drawn attention for showing state-of-the-art performance on various downstream tasks, though it has shown poor linear separability resulting from the token-level approach. In this paper, we propose a contrastive learning-based multi-view masked autoencoder for MIM, thus exploiting an image-level approach by learning common features from two different augmented views. We strengthen the MIM by learning long-range global patterns from contrastive loss. Our framework adopts a simple encoder-decoder architecture, thus learning rich and general representations by following a simple process: (1) Two different views are generated from an input image with random masking and by contrastive loss, we can learn the semantic distance of the representations generated by an encoder. By applying a high mask ratio, of 80%, it works as strong augmentation and alleviates the representation collapse problem. (2) With reconstruction loss, the decoder learns to reconstruct an original image from the masked image. We assessed our framework through several experiments on benchmark datasets of image classification, object detection, and semantic segmentation. We achieved 84.3% in fine-tuning accuracy on ImageNet-1K classification and 76.7% in linear probing, thus exceeding previous studies and showing promising results on other downstream tasks. The experimental results demonstrate that our work can learn rich and general image representation by applying contrastive loss to masked image modeling.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > Department of Industrial and Systems Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE