Awesome Human Data for Manipulation

A curated list of papers and datasets on using human data (egocentric videos, VR/teleop, motion capture, hand pose, edited/robotized videos, etc.) to pretrain and scale robot manipulation foundation models.

Taxonomy
Papers
Datasets

Taxonomy

Categories are not mutually exclusive — many works combine multiple.

1) Retargeting

Use estimated human motion (hands/wrists/body) and explicitly map it to a robot’s action space (IK / optimization / kinematic alignment).

Typical outputs: robot joint actions, end-effector poses, dexterous hand commands.

2) Human Embodiment (Train w/o Retargeting)

Treat humans as another “embodiment” during training by learning in a human action space (e.g., hand pose / fingertip positions / MANO) and transfer via modular adapters or shared representations.

Key theme: design an action representation that is meaningful for both humans and robots.

3) Data Editing / Synthetic Retargeting

Transform human videos into robot-compatible training data (compositing robot arms, inpainting humans away, pose-conditioned rendering, etc.) to reduce visual and/or embodiment gaps.

Key theme: keep scene/task semantics while making the visuals “robot-like”.

4) World Models / Predictive Representations

Pretrain models that predict future visual states (video diffusion / autoregressive video) and use these predictive representations to condition or unify policies.

Key theme: leverage human/robot videos to learn dynamics priors.

Papers

Format: Title (Year). Authors/Venue [Paper] [Project] [Code] [Data]

Retargeting

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos (2025).
[arXiv] [Project] [Code] [Benchmark]
MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies (2025).
[arXiv] [Project] [Code] [Dataset]
Humanoid Policy ~ Human Policy (2025).
[arXiv] [Project] [Code] [Data] [Hardware]
In-N-On: Scaling Egocentric manipulation with in-the-wild and on-task data (2025).
[arXiv] [Project]

Human Embodiment (Train w/o Retargeting)

Humanoid Manipulation Interface (HuMI): Humanoid Whole-Body Manipulation from Robot-Free Demonstrations (2026).
[arXiv] [Project] [PDF]
EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration (2026).
[arXiv] [Project] (Code: coming soon)
Emergence of Human to Robot Transfer in Vision-Language-Action Models (2025).
[Paper] [Blog]
EgoMimic: Scaling Imitation Learning via Egocentric Video (2024).
[arXiv] [Project] [Code] [Dataset]
H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation (2025).
[arXiv] [Project] [Code] [Model]
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video (2025).
[arXiv] [Project] [Code]

Data Editing / Synthetic Retargeting

H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos (2025).
[arXiv] [Dataset (H2R-1M)]
Phantom: Training Robots Without Robots Using Only Human Videos (2025).
[arXiv] [Project] [Code]
Masquerade: Learning from In-the-wild Human Videos using Data-Editing (2025).
[arXiv] [Project] [Code]

World Models / Predictive Representations

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos (2026).
[arXiv] [Project] (Code: coming soon)
LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion (2026).
[arXiv] [Project] [Code] (Data/Checkpoints: coming soon)
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning (2026).
[arXiv] [Project] [Code] [Dataset]
CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos (2026).
[arXiv] [Project] (Code: coming soon)
Large Video Planner Enables Generalizable Robot Control (2025).
[arXiv] [Project] [Code] [Hugging Face]
Motus: A Unified Latent Action World Model (2025).
[arXiv] [Project][Code]
World Models Can Leverage Human Videos for Dexterous Manipulation (2025).
[arXiv] [Project]
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations (2025).
[Project] [Code]
RynnVLA-002: A Unified Vision-Language-Action and World Model (2025).
[arXiv] [Code] [Model]
UniVLA: Learning to Act Anywhere with Task-Centric Latent Actions (2025).
[arXiv] [Code]
AgiBot World Colosseo:
A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems (2025)
[arXiv] [Code]

Datasets

Egocentric Human Manipulation Datasets

Ego4D (Meta) — large-scale egocentric video.
[Website] [GitHub]
EPIC-KITCHENS — egocentric cooking activities and actions.
[Website]
Ego-Exo4D — paired egocentric + exocentric.
[Website]
HoloAssist — egocentric assistive tasks (AR/egocentric).
[Website]
HOT3D — egocentric tasks with hand/object annotations (Meta).
[GitHub]
HOI4D — 4D hand-object interaction dataset (includes egocentric views).
[Website]
TACO — tool-use/action-centric dataset.
[Website]

Human → Robot “Robotized Video” Datasets

H2R-1M — human videos with robot arms composited/rendered into scenes.
[Dataset] [Paper]

Human Embodiment / Hand-Action Supervision Datasets

EgoDex dataset (paired with hand pose annotations; used for human-action pretraining).
[Project] [Code]
EgoMimic sample datasets (human + robot episodes in robomimic HDF5 format).
[Dataset] [Paper]
MotionTrans dataset (VR human tasks + robot tasks for cotraining).
[Dataset] [Paper]

Humanoid Manipulation Benchmarks & Datasets

EgoVLA (Isaac Lab simulation benchmark for humanoid bimanual manipulation).
[Benchmark] [Paper]

If you have real humanoid manipulation datasets (e.g., Unitree / H1 / G1 + dexterous hands, teleop logs, whole-body manipulation), please open a PR and add them here.

priest-yang/Awesome-Human-Data-For-Manipulation