Multi-View Masked Autoencoder for General Image Representation

Ji, Seungbin; Han, Sangkwon; Rhee, Jongtae

Detailed Information

Cited 1 time in webofscience

Cited 1 time in scopus

Metadata Downloads

Multi-View Masked Autoencoder for General Image Representation

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ji, Seungbin	-
dc.contributor.author	Han, Sangkwon	-
dc.contributor.author	Rhee, Jongtae	-
dc.date.accessioned	2024-08-08T12:00:39Z	-
dc.date.available	2024-08-08T12:00:39Z	-
dc.date.issued	2023-11	-
dc.identifier.issn	2076-3417	-
dc.identifier.issn	2076-3417	-
dc.identifier.uri	https://scholarworks.dongguk.edu/handle/sw.dongguk/21918	-
dc.description.abstract	Self-supervised learning is a method that learns general representation from unlabeled data. Masked image modeling (MIM), one of the generative self-supervised learning methods, has drawn attention for showing state-of-the-art performance on various downstream tasks, though it has shown poor linear separability resulting from the token-level approach. In this paper, we propose a contrastive learning-based multi-view masked autoencoder for MIM, thus exploiting an image-level approach by learning common features from two different augmented views. We strengthen the MIM by learning long-range global patterns from contrastive loss. Our framework adopts a simple encoder-decoder architecture, thus learning rich and general representations by following a simple process: (1) Two different views are generated from an input image with random masking and by contrastive loss, we can learn the semantic distance of the representations generated by an encoder. By applying a high mask ratio, of 80%, it works as strong augmentation and alleviates the representation collapse problem. (2) With reconstruction loss, the decoder learns to reconstruct an original image from the masked image. We assessed our framework through several experiments on benchmark datasets of image classification, object detection, and semantic segmentation. We achieved 84.3% in fine-tuning accuracy on ImageNet-1K classification and 76.7% in linear probing, thus exceeding previous studies and showing promising results on other downstream tasks. The experimental results demonstrate that our work can learn rich and general image representation by applying contrastive loss to masked image modeling.	-
dc.format.extent	15	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	MDPI	-
dc.title	Multi-View Masked Autoencoder for General Image Representation	-
dc.type	Article	-
dc.publisher.location	스위스	-
dc.identifier.doi	10.3390/app132212413	-
dc.identifier.scopusid	2-s2.0-85192366988	-
dc.identifier.wosid	001120741000001	-
dc.identifier.bibliographicCitation	Applied Sciences, v.13, no.22, pp 1 - 15	-
dc.citation.title	Applied Sciences	-
dc.citation.volume	13	-
dc.citation.number	22	-
dc.citation.startPage	1	-
dc.citation.endPage	15	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Chemistry	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Materials Science	-
dc.relation.journalResearchArea	Physics	-
dc.relation.journalWebOfScienceCategory	Chemistry, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Engineering, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Materials Science, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Physics, Applied	-
dc.subject.keywordAuthor	contrastive learning	-
dc.subject.keywordAuthor	deep learning	-
dc.subject.keywordAuthor	image representation learning	-
dc.subject.keywordAuthor	masked image modeling	-
dc.subject.keywordAuthor	self-supervised learning	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > Department of Industrial and Systems Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE