TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
We consider the task of 3D pose estimation and tracking of multiple people seen in an arbitrary number of camera feeds. We propose TesseTrack, a novel top-down approach that simultaneously reasons about multiple individualsโ 3D body joint reconstructions and associations in space and time in a single end-to-end learnable framework. At the core of our approach is a novel spatio-temporal formulation that operates in a common voxelized feature space aggregated from single- or multiple camera views. After a person detection step, a 4D CNN produces short-term person-specific representations which are then linked across time by a differentiable matcher. The linked descriptions are then merged and deconvolved into 3D poses. This joint spatio-temporal formulation contrasts with previous piece-wise strategies that treat 2D pose estimation, 2D-to-3D lifting, and 3D pose tracking as independent sub-problems that are error-prone when solved in isolation. Furthermore, unlike previous methods, TesseTrack is robust to changes in the number of camera views and achieves very good results even if a single view is available at inference time. Quantitative evaluation of 3D pose reconstruction accuracy on standard benchmarks shows significant improvements over the state of the art. Evaluation of multi-person articulated 3D pose tracking in our novel evaluation framework demonstrates the superiority of TesseTrack over strong baselines.
PDF Abstract CVPR 2021 PDF CVPR 2021 AbstractResults from the Paper
Ranked #1 on 3D Human Pose Estimation on Panoptic (using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
3D Multi-Person Pose Estimation | Campus | TesseTrack | PCP3D | 97.4 | # 1 | ||
3D Pose Estimation | Human3.6M | TesseTrack | Average MPJPE (mm) | 18.7 | # 1 | ||
3D Human Pose Estimation | Human3.6M | TesseTrack (Monocular) | Average MPJPE (mm) | 44.6 | # 104 | ||
Using 2D ground-truth joints | No | # 2 | |||||
Multi-View or Monocular | Monocular | # 1 | |||||
3D Human Pose Estimation | Human3.6M | TesseTrack (Multi-View) | Average MPJPE (mm) | 18.7 | # 6 | ||
Using 2D ground-truth joints | No | # 2 | |||||
Multi-View or Monocular | Multi-View | # 1 | |||||
3D Human Pose Tracking | Panoptic | TesseTrack | 3DMOTA | 94.1 | # 1 | ||
3D Human Pose Estimation | Panoptic | TesseTrack Multi-View (5 views) | Average MPJPE (mm) | 7.3 | # 1 | ||
3D Multi-Person Pose Estimation | Panoptic | TesseTrack | Average MPJPE (mm) | 7.3 | # 1 | ||
3D Human Pose Estimation | Panoptic | TesseTrack Monocular | Average MPJPE (mm) | 18.9 | # 5 | ||
3D Multi-Person Pose Estimation | Shelf | TesseTrack (paper) | PCP3D | 98.2 | # 1 | ||
3D Multi-Person Pose Estimation | Shelf | TesseTrack (correct) | PCP3D | 97.9 | # 4 |