Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadricsopen access

Authors
Tse, Tze Ho EldenFeng, RunyangZheng, LinfangPark, JihoGao, YixingKim, JihieLeonardis, AlesChang, Hyung Jin
Issue Date
Apr-2025
Publisher
Association for the Advancement of Artificial Intelligence
Keywords
Action Recognition; Bounding-box; Collaborative Learning; Object Interactions; Object Movements; Object Pose; Object Reconstruction; Pose-estimation; Superquadrics; Unified Modeling
Citation
Proceedings of the AAAI Conference on Artificial Intelligence, v.39, no.7, pp 7437 - 7445
Pages
9
Indexed
FOREIGN
Journal Title
Proceedings of the AAAI Conference on Artificial Intelligence
Volume
39
Number
7
Start Page
7437
End Page
7445
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/58328
DOI
10.1609/aaai.v39i7.32800
ISSN
2159-5399
2374-3468
Abstract
With the availability of egocentric 3D hand-object interaction datasets, there is increasing interest in developing unified models for hand-object pose estimation and action recognition. However, existing methods still struggle to recognise seen actions on unseen objects due to the limitations in representing object shape and movement using 3D bounding boxes. Additionally, the reliance on object templates at test time limits their generalisability to unseen objects. To address these challenges, we propose to leverage superquadrics as an alternative 3D object representation to bounding boxes and demonstrate their effectiveness on both template-free object reconstruction and action recognition tasks. Moreover, as we find that pure appearance-based methods can outperform the unified methods, the potential benefits from 3D geometric information remain unclear. Therefore, we study the compositionality of actions by considering a more challenging task where the training combinations of verbs and nouns do not overlap with the testing split. We extend H2O and FPHA datasets with compositional splits and design a novel collaborative learning framework that can explicitly reason about the geometric relations between hands and the manipulated object. Through extensive quantitative and qualitative evaluations, we demonstrate significant improvements over the state-of-the-arts in (compositional) action recognition. Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Ji Hie photo

Kim, Ji Hie
College of Advanced Convergence Engineering (Department of Computer Science and Artificial Intelligence)
Read more

Altmetrics

Total Views & Downloads

BROWSE