46 results for “topic:spatial-intelligence”
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
🌐 3D and 4D World Modeling: A Survey
InternRobotics' open platform for building generalized navigation foundation models.
[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling
[CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
[NeurIPS 2025] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
[ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
[CVPR 2026] G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
[CVPR 2025] Source codes for the paper "3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning"
[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D
[AAAI 2026 Oral] LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
[CVPR 2026] WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World
Visual Spatial Tuning
Scaling Spatial Intelligence with Multimodal Foundation Models
[ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
[ICCV 2025] Perspective-Invariant 3D Object Detection
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
🌐 Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
[CVPR 2025] Code for "StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation".
[CVPR 2026] Thinking in 360°: Humanoid Visual Search in the Wild
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
[Awesome-Spatial-VLMs] This repository is the official, community-maintained resource for the survey paper: Spatial Intelligence in Vision-Language Models: A Comprehensive Survey;
[ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction
[NeurIPS 2025] SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.14357
Multimodal datasets for spatial intelligence
[ICRA 2026] Official codebase for NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions
🌐 A Roadmap for 3D Scene Understanding in the Wild