Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing

Park, Yoojin; Sung, Yunsick

Detailed Information

Cited 1 time in webofscience

Cited 1 time in scopus

Metadata Downloads

Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensingopen access

Authors: Park, Yoojin; Sung, Yunsick

Issue Date: Sep-2023

Publisher: MDPI

Keywords: action classification; data augmentation; deep learning; image processing; imitation learning; Swin Transformer

Citation: Remote Sensing, v.15, no.17, pp 1 - 15

Pages: 15

Indexed: SCIE
SCOPUS

Journal Title: Remote Sensing

Volume: 15

Number: 17

Start Page: 1

End Page: 15

URI: https://scholarworks.dongguk.edu/handle/sw.dongguk/18676

DOI: 10.3390/rs15174147

ISSN: 2072-4292
2072-4292

Abstract: In unmanned systems, remote sensing is an approach that collects and analyzes data such as visual images, infrared thermal images, and LiDAR sensor data from a distance using a system that operates without human intervention. Recent advancements in deep learning enable the direct mapping of input images in remote sensing to desired outputs, making it possible to learn through imitation learning and for unmanned systems to learn by collecting and analyzing those images. In the case of autonomous cars, raw high-dimensional data are collected using sensors, which are mapped to the values of steering and throttle through a deep learning network to train imitation learning. Therefore, by imitation learning, the unmanned systems observe expert demonstrations and learn expert policies, even in complex environments. However, in imitation learning, collecting and analyzing a large number of images from the game environment incurs time and costs. Training with a limited dataset leads to a lack of understanding of the environment. There are some augmentation approaches that have the limitation of increasing the dataset because of considering only the locations of objects visited and estimated. Therefore, it is required to consider the diverse kinds of the location of objects not visited to solve the limitation. This paper proposes an enhanced model to augment the number of training images comprising a Preprocessor, an enhanced Swin Transformer model, and an Action model. Using the original network structure of the Swin Transformer model for image augmentation in imitation learning is challenging. Therefore, the internal structure of the Swin Transformer model is enhanced, and the Preprocessor and Action model are combined to augment training images. The proposed method was verified through an experimental process by learning from expert demonstrations and augmented images, which reduced the total loss from 1.24068 to 0.41616. Compared to expert demonstrations, the accuracy was approximately 86.4%, and the proposed method achieved 920 points and 1200 points more than the comparison model to verify generalization. © 2023 by the authors.

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Sung, Yunsick photo

Sung, Yunsick: College of Advanced Convergence Engineering (Department of Computer Science and Artificial Intelligence)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE