74 results for “topic:embodied-agent”
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
Autonomous Agents (LLMs) research papers. Updated Daily.
awesome grounding: A curated list of research papers in visual grounding
仅需Python基础,从0构建自己的具身智能机器人;从0逐步构建VLA/OpenVLA/SmolVLA/Pi0, 深入理解具身智能
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
Democratization of RT-2 "RT-2: New model translates vision and language into action"
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
RAI is a vendor agnostic agentic framework for Physical AI robotics, utilizing ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.
An open source framework for research in Embodied-AI from AI2.
Odyssey: Empowering Minecraft Agents with Open-World Skills
Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
Seamlessly integrate state-of-the-art transformer models into robotics stacks
[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
🧠 The production-ready cognitive foundation for autonomous systems such as OpenClaw and Embodied-AI. For memory management, from extraction and search to automated optimization, with API, MCP, CLI, and insights dashboard out-of-the-box.
[arXiv 2023] Embodied Task Planning with Large Language Models
A collection of vision-language-action model post-training methods.
[CVPR 2025 Highlight🔥] Official code repository for "Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning"
A unified, agentic system for general-purpose robots, enabling multi-modal perception, mapping and localization, and autonomous mobility and manipulation, with intelligent interaction with users.
[IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "
[NeurIPS`25] TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer
OceanGym: A Benchmark Environment for Underwater Embodied Agents
Teaching Vison-Language Models as Progress Estimators across Embodied Scenarios
[NeurIPS 2024] GenRL: Multimodal-foundation world models enable grounding language and video prompts into embodied domains, by turning them into sequences of latent world model states. Latent state sequences can be decoded using the decoder of the model, allowing visualization of the expected behavior, before training the agent to execute it.
Official repository of the paper "Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms"
Official Repo of LangSuitE
[ICLR 2025 Spotlight] Official PyTorch Implementation of "What Makes a Good Diffusion Planner for Decision Making?"
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
[AAAI-25 Oral] Official Implementation of "FLAME: Learning to Navigate with Multimodal LLM in Urban Environments"
🦾Set up your embodied LLM agent with the same ease as normal agents in CrewAI or Autogen