3D Question Answering (3D-QA)

7 papers with code • 2 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find 3D Question Answering (3D-QA) models and implementations

Datasets


Most implemented papers

3D-LLM: Injecting the 3D World into Large Language Models

umass-foundation-model/3d-llm NeurIPS 2023

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

ziyuguo99/point-bind_point-llm 1 Sep 2023

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video.

PointLLM: Empowering Large Language Models to Understand Point Clouds

openrobotlab/pointllm 31 Aug 2023

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

qizekun/ShapeLLM 27 Feb 2024

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

Towards Learning a Generalist Model for Embodied Navigation

zd11024/NaviLLM 4 Dec 2023

We conduct extensive experiments to evaluate the performance and generalizability of our model.

ScanQA: 3D Question Answering for Spatial Scene Understanding

atr-dbi/scanqa CVPR 2022

We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA).

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

matthewdm0816/bridgeqa 24 Feb 2024

In 3D Visual Question Answering (3D VQA), the scarcity of fully annotated data and limited visual content diversity hampers the generalization to novel scenes and 3D concepts (e. g., only around 800 scenes are utilized in ScanQA and SQA dataset).