StARformer: Transformer With State-Action-Reward Representations for Robot Learningopen access
- Authors
- Shang, Jinghuan; Li, Xiang; Kahatapitiya, Kumara; Lee, Yu-Cheol; Ryoo, Michael S.
- Issue Date
- Nov-2023
- Publisher
- IEEE
- Keywords
- imitation learning; reinforcement learning; robot learning; Transformer
- Citation
- IEEE Transactions on Pattern Analysis and Machine Intelligence, v.45, no.11, pp 12862 - 12877
- Pages
- 16
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Volume
- 45
- Number
- 11
- Start Page
- 12862
- End Page
- 12877
- URI
- https://scholarworks.dongguk.edu/handle/sw.dongguk/20484
- DOI
- 10.1109/TPAMI.2022.3204708
- ISSN
- 0162-8828
1939-3539
- Abstract
- Reinforcement Learning (RL) can be considered as a sequence modeling task, where an agent employs a sequence of past state-action-reward experiences to predict a sequence of future actions. In this work, we propose State-Action-Reward Transformer (StARformer), a Transformer architecture for robot learning with image inputs, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. StARformer first extracts StAR-representations using self-attending patches of image states, action, and reward tokens within a short temporal window. These StAR-representations are combined with pure image state representations, extracted as convolutional features, to perform self-attention over the whole sequence. Our experimental results show that StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, under both offline-RL and imitation learning settings. We find that models can benefit from our combination of patch-wise and convolutional image embeddings. StARformer is also more compliant with longer sequences of inputs than the baseline method. Finally, we demonstrate how StARformer can be successfully applied to a real-world robot imitation learning setting via a human-following task. © 1979-2012 IEEE.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Engineering > Department of Information and Communication Engineering > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.