3D Question Answering (3D-QA)

7 papers with code • 2 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in 3D Question Answering (3D-QA)

Trend	Dataset	Best Model	Paper	Code	Compare
	ScanQA Test w/ objects	BridgeQA			See all
	3D MM-Vet	ShapeLLM-13B			See all

Libraries

Use these libraries to find 3D Question Answering (3D-QA) models and implementations

qizekun/ShapeLLM

4 papers

Pointcept/GPT4Point

3 papers

254

Datasets

3D MM-Vet

Most implemented papers

Most implemented Social Latest No code

3D-LLM: Injecting the 3D World into Large Language Models

umass-foundation-model/3d-llm • • NeurIPS 2023

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Paper
Code

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

ziyuguo99/point-bind_point-llm • • 1 Sep 2023

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video.

Paper
Code

PointLLM: Empowering Large Language Models to Understand Point Clouds

openrobotlab/pointllm • • 31 Aug 2023

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.

Paper
Code

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

qizekun/ShapeLLM • • 27 Feb 2024

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

Paper
Code

Towards Learning a Generalist Model for Embodied Navigation

zd11024/NaviLLM • • 4 Dec 2023

We conduct extensive experiments to evaluate the performance and generalizability of our model.

Paper
Code

ScanQA: 3D Question Answering for Spatial Scene Understanding

atr-dbi/scanqa • • CVPR 2022

We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA).

Paper
Code

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

matthewdm0816/bridgeqa • • 24 Feb 2024

In 3D Visual Question Answering (3D VQA), the scarcity of fully annotated data and limited visual content diversity hampers the generalization to novel scenes and 3D concepts (e. g., only around 800 scenes are utilized in ScanQA and SQA dataset).

Paper
Code

3D Question Answering (3D-QA)

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

3D-LLM: Injecting the 3D World into Large Language Models

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

PointLLM: Empowering Large Language Models to Understand Point Clouds

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Towards Learning a Generalist Model for Embodied Navigation

ScanQA: 3D Question Answering for Spatial Scene Understanding

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Content

Benchmarks

Add a Result