3D Question Answering (3D-QA)
7 papers with code • 2 benchmarks • 1 datasets
Libraries
Use these libraries to find 3D Question Answering (3D-QA) models and implementationsMost implemented papers
3D-LLM: Injecting the 3D World into Large Language Models
Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video.
PointLLM: Empowering Large Language Models to Understand Point Clouds
The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.
Towards Learning a Generalist Model for Embodied Navigation
We conduct extensive experiments to evaluate the performance and generalizability of our model.
ScanQA: 3D Question Answering for Spatial Scene Understanding
We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA).
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA
In 3D Visual Question Answering (3D VQA), the scarcity of fully annotated data and limited visual content diversity hampers the generalization to novel scenes and 3D concepts (e. g., only around 800 scenes are utilized in ScanQA and SQA dataset).