3D Object Captioning

5 papers with code • 1 benchmarks • 1 datasets

3D object captioning involves generating a natural language description of an object, given its point cloud representation.

Benchmarks

Add a Result

These leaderboards are used to track progress in 3D Object Captioning

Trend	Dataset	Best Model	Paper	Code	Compare
	Objaverse	MiniGPT-3D			See all

Libraries

Use these libraries to find 3D Object Captioning models and implementations

qizekun/ShapeLLM

3 papers

Pointcept/GPT4Point

2 papers

256

Datasets

Objaverse

Most implemented papers

Most implemented Social Latest No code

3D-LLM: Injecting the 3D World into Large Language Models

umass-foundation-model/3d-llm • • NeurIPS 2023

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Paper
Code

PointLLM: Empowering Large Language Models to Understand Point Clouds

openrobotlab/pointllm • • 31 Aug 2023

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.

Paper
Code

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

qizekun/ShapeLLM • • 27 Feb 2024

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

Paper
Code

View Selection for 3D Captioning via Diffusion Ranking

crockwell/cap3d • • 11 Apr 2024

Scalable annotation approaches are crucial for constructing extensive 3D-text datasets, facilitating a broader range of applications.

Paper
Code

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

tangyuan96/minigpt-3d • 2 May 2024

Notably, MiniGPT-3D gains an 8. 12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800.

Paper
Code

3D Object Captioning

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

3D-LLM: Injecting the 3D World into Large Language Models

PointLLM: Empowering Large Language Models to Understand Point Clouds

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

View Selection for 3D Captioning via Diffusion Ranking

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Content

Benchmarks

Add a Result