57 results for “topic:vision-language-action-model”
[IROS 2025 Best Paper Award Finalist & IEEE TRO 2026] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
A comprehensive list of papers about Robot Manipulation, including papers, codes, and related websites.
Official code of Motus: A Unified Latent Action World Model
InternRobotics' open platform for building generalized navigation foundation models.
仅需Python基础,从0构建自己的具身智能机器人;从0逐步构建VLA/OpenVLA/SmolVLA/Pi0, 深入理解具身智能
[AAAI 2026] OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
[ICLR 2026] The offical Implementation of "Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model"
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulation
Code for kai0, including training, inference and data collection.
Official implementation of ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver.
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
[CVPR 2026] WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving
LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintained🔥]
WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving
The official implementation of "DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation". (arXiv 2601.22153)
🔥 The first open-sourced diffusion vision-langauge-action model.
Open & Reproducible Research for Tracking VLAs
🔥This is a curated list of "A survey on Efficient Vision-Language Action Models" research. We will continue to maintain and update the repository, so follow us to keep up with the latest developments!!!
A collection of vision-language-action model post-training methods.
A comprehensive list of papers about dual-system VLA models, including papers, codes, and related websites.
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
LAP: Language-Action Pre-Training Enables Zero-Shot Cross Embodiment Transfer
[AAAI 2026]Release of code, datasets and model for our work TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
[AAAI 2026] Official code for MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.
mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies