Interacting Hand-Object Pose Estimation via Dense Mutual Attention

16 Nov 2022  ·  Rong Wang, Wei Mao, Hongdong Li ·

3D hand-object pose estimation is the key to the success of many computer vision applications. The main focus of this task is to effectively model the interaction between the hand and an object. To this end, existing works either rely on interaction constraints in a computationally-expensive iterative optimization, or consider only a sparse correlation between sampled hand and object keypoints. In contrast, we propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object. Specifically, we first construct the hand and object graphs according to their mesh structures. For each hand node, we aggregate features from every object node by the learned attention and vice versa for each object node. Thanks to such dense mutual attention, our method is able to produce physically plausible poses with high quality and real-time inference speed. Extensive quantitative and qualitative experiments on large benchmark datasets show that our method outperforms state-of-the-art methods. The code is available at https://github.com/rongakowang/DenseMutualAttention.git.

PDF Abstract

Datasets


Results from the Paper


Ranked #2 on hand-object pose on HO-3D (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
hand-object pose DexYCB DMA Average MPJPE (mm) 12.7 # 3
Procrustes-Aligned MPJPE 6.86 # 3
OCE 27.3 # 4
MCE 32.6 # 2
ADD-S 15.9 # 2
hand-object pose HO-3D DMA Average MPJPE (mm) 22.2 # 2
ST-MPJPE 23.8 # 2
PA-MPJPE 10.1 # 3
OME 45.5 # 2
ADD-S 20.8 # 2
3D Hand Pose Estimation HO-3D DMA Average MPJPE (mm) 22.2 # 2
ST-MPJPE (mm) 23.8 # 6
PA-MPJPE (mm) 10.1 # 7

Methods


No methods listed for this paper. Add relevant methods here