priest-yang/Awesome-Human-Data-For-Manipulation
Awesome Human Data for Manipulation
A curated list of papers and datasets on using human data (egocentric videos, VR/teleop, motion capture, hand pose, edited/robotized videos, etc.) to pretrain and scale robot manipulation foundation models.
Contents
Taxonomy
Categories are not mutually exclusive — many works combine multiple.
1) Retargeting
Use estimated human motion (hands/wrists/body) and explicitly map it to a robot’s action space (IK / optimization / kinematic alignment).
Typical outputs: robot joint actions, end-effector poses, dexterous hand commands.
2) Human Embodiment (Train w/o Retargeting)
Treat humans as another “embodiment” during training by learning in a human action space (e.g., hand pose / fingertip positions / MANO) and transfer via modular adapters or shared representations.
Key theme: design an action representation that is meaningful for both humans and robots.
3) Data Editing / Synthetic Retargeting
Transform human videos into robot-compatible training data (compositing robot arms, inpainting humans away, pose-conditioned rendering, etc.) to reduce visual and/or embodiment gaps.
Key theme: keep scene/task semantics while making the visuals “robot-like”.
4) World Models / Predictive Representations
Pretrain models that predict future visual states (video diffusion / autoregressive video) and use these predictive representations to condition or unify policies.
Key theme: leverage human/robot videos to learn dynamics priors.
Papers
Format: Title (Year). Authors/Venue [Paper] [Project] [Code] [Data]
Retargeting
-
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos (2025).
[arXiv] [Project] [Code] [Benchmark] -
MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies (2025).
[arXiv] [Project] [Code] [Dataset] -
Humanoid Policy ~ Human Policy (2025).
[arXiv] [Project] [Code] [Data] [Hardware] -
In-N-On: Scaling Egocentric manipulation with in-the-wild and on-task data (2025).
[arXiv] [Project]
Human Embodiment (Train w/o Retargeting)
-
Humanoid Manipulation Interface (HuMI): Humanoid Whole-Body Manipulation from Robot-Free Demonstrations (2026).
[arXiv] [Project] [PDF] -
EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration (2026).
[arXiv] [Project] (Code: coming soon) -
Emergence of Human to Robot Transfer in Vision-Language-Action Models (2025).
[Paper] [Blog] -
EgoMimic: Scaling Imitation Learning via Egocentric Video (2024).
[arXiv] [Project] [Code] [Dataset] -
H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation (2025).
[arXiv] [Project] [Code] [Model] -
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video (2025).
[arXiv] [Project] [Code]
Data Editing / Synthetic Retargeting
-
H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos (2025).
[arXiv] [Dataset (H2R-1M)] -
Phantom: Training Robots Without Robots Using Only Human Videos (2025).
[arXiv] [Project] [Code] -
Masquerade: Learning from In-the-wild Human Videos using Data-Editing (2025).
[arXiv] [Project] [Code]
World Models / Predictive Representations
-
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos (2026).
[arXiv] [Project] (Code: coming soon) -
LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion (2026).
[arXiv] [Project] [Code] (Data/Checkpoints: coming soon) -
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning (2026).
[arXiv] [Project] [Code] [Dataset] -
CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos (2026).
[arXiv] [Project] (Code: coming soon) -
Large Video Planner Enables Generalizable Robot Control (2025).
[arXiv] [Project] [Code] [Hugging Face] -
Motus: A Unified Latent Action World Model (2025).
[arXiv] [Project][Code] -
World Models Can Leverage Human Videos for Dexterous Manipulation (2025).
[arXiv] [Project] -
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations (2025).
[Project] [Code] -
RynnVLA-002: A Unified Vision-Language-Action and World Model (2025).
[arXiv] [Code] [Model] -
UniVLA: Learning to Act Anywhere with Task-Centric Latent Actions (2025).
[arXiv] [Code] -
AgiBot World Colosseo:
A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems (2025)
[arXiv] [Code]
Datasets
Egocentric Human Manipulation Datasets
- Ego4D (Meta) — large-scale egocentric video.
[Website] [GitHub] - EPIC-KITCHENS — egocentric cooking activities and actions.
[Website] - Ego-Exo4D — paired egocentric + exocentric.
[Website] - HoloAssist — egocentric assistive tasks (AR/egocentric).
[Website] - HOT3D — egocentric tasks with hand/object annotations (Meta).
[GitHub] - HOI4D — 4D hand-object interaction dataset (includes egocentric views).
[Website] - TACO — tool-use/action-centric dataset.
[Website]
Human → Robot “Robotized Video” Datasets
Human Embodiment / Hand-Action Supervision Datasets
- EgoDex dataset (paired with hand pose annotations; used for human-action pretraining).
[Project] [Code] - EgoMimic sample datasets (human + robot episodes in robomimic HDF5 format).
[Dataset] [Paper] - MotionTrans dataset (VR human tasks + robot tasks for cotraining).
[Dataset] [Paper]
Humanoid Manipulation Benchmarks & Datasets
- EgoVLA (Isaac Lab simulation benchmark for humanoid bimanual manipulation).
[Benchmark] [Paper]
If you have real humanoid manipulation datasets (e.g., Unitree / H1 / G1 + dexterous hands, teleop logs, whole-body manipulation), please open a PR and add them here.