Contextual Associated Triplet Queries for Panoptic Scene Graph Generation

ACM Multimedia Asia 2024 · Jingbin Xu, Junwen Chen, Keiji Yanai ·

The Panoptic Scene Graph generation (PSG) task aims to extract the triplets composed of subject, object, and relation based on panoptic segmentation. For one-stage methods, PSGTR predicts the subject, object, and relation by one query. However, the integrated query is too implicit to simultaneously ascertain pairs of instances and relations. In PSGFormer, it learns instances and relation queries separately and establishes matches between subject-relation and object-relation pairs by employing the relation as an index. Nevertheless, this method could potentially impede the accurate determination of the optimal match. To address the aforementioned issues, we propose a new one-stage method, Contextual Associated Triplet Queries (CATQ), which employs three branches to decode subject, object, and relation features separately. Additionally, we leverage instance information to guide the relation decoding process. Furthermore, we introduce the triplet context fusion block to enable the extraction of more comprehensive instance pairs and triplet relations. Our proposed method achieves 34.8 Recall@20 and 20.9 mRecall@20 respectively and surpasses the state-of-the-art baseline method by 22.5\% and 26.0\% with half of the training session.

PDF Abstract