3D Object Captioning
5 papers with code • 1 benchmarks • 1 datasets
3D object captioning involves generating a natural language description of an object, given its point cloud representation.
Libraries
Use these libraries to find 3D Object Captioning models and implementationsMost implemented papers
3D-LLM: Injecting the 3D World into Large Language Models
Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.
PointLLM: Empowering Large Language Models to Understand Point Clouds
The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.
View Selection for 3D Captioning via Diffusion Ranking
Scalable annotation approaches are crucial for constructing extensive 3D-text datasets, facilitating a broader range of applications.
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
Notably, MiniGPT-3D gains an 8. 12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800.