GitHunt
PR

priest-yang/Awesome-Human-Data-For-Manipulation

Awesome Human Data for Manipulation

Awesome
PRs Welcome

A curated list of papers and datasets on using human data (egocentric videos, VR/teleop, motion capture, hand pose, edited/robotized videos, etc.) to pretrain and scale robot manipulation foundation models.


Contents


Taxonomy

Categories are not mutually exclusive — many works combine multiple.

1) Retargeting

Use estimated human motion (hands/wrists/body) and explicitly map it to a robot’s action space (IK / optimization / kinematic alignment).

Typical outputs: robot joint actions, end-effector poses, dexterous hand commands.

2) Human Embodiment (Train w/o Retargeting)

Treat humans as another “embodiment” during training by learning in a human action space (e.g., hand pose / fingertip positions / MANO) and transfer via modular adapters or shared representations.

Key theme: design an action representation that is meaningful for both humans and robots.

3) Data Editing / Synthetic Retargeting

Transform human videos into robot-compatible training data (compositing robot arms, inpainting humans away, pose-conditioned rendering, etc.) to reduce visual and/or embodiment gaps.

Key theme: keep scene/task semantics while making the visuals “robot-like”.

4) World Models / Predictive Representations

Pretrain models that predict future visual states (video diffusion / autoregressive video) and use these predictive representations to condition or unify policies.

Key theme: leverage human/robot videos to learn dynamics priors.


Papers

Format: Title (Year). Authors/Venue [Paper] [Project] [Code] [Data]

Retargeting


Human Embodiment (Train w/o Retargeting)

  • Humanoid Manipulation Interface (HuMI): Humanoid Whole-Body Manipulation from Robot-Free Demonstrations (2026).
    [arXiv] [Project] [PDF]

  • EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration (2026).
    [arXiv] [Project] (Code: coming soon)

  • Emergence of Human to Robot Transfer in Vision-Language-Action Models (2025).
    [Paper] [Blog]

  • EgoMimic: Scaling Imitation Learning via Egocentric Video (2024).
    [arXiv] [Project] [Code] [Dataset]

  • H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation (2025).
    [arXiv] [Project] [Code] [Model]

  • EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video (2025).
    [arXiv] [Project] [Code]


Data Editing / Synthetic Retargeting


World Models / Predictive Representations

  • DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos (2026).
    [arXiv] [Project] (Code: coming soon)

  • LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion (2026).
    [arXiv] [Project] [Code] (Data/Checkpoints: coming soon)

  • Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning (2026).
    [arXiv] [Project] [Code] [Dataset]

  • CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos (2026).
    [arXiv] [Project] (Code: coming soon)

  • Large Video Planner Enables Generalizable Robot Control (2025).
    [arXiv] [Project] [Code] [Hugging Face]

  • Motus: A Unified Latent Action World Model (2025).
    [arXiv] [Project][Code]

  • World Models Can Leverage Human Videos for Dexterous Manipulation (2025).
    [arXiv] [Project]

  • Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations (2025).
    [Project] [Code]

  • RynnVLA-002: A Unified Vision-Language-Action and World Model (2025).
    [arXiv] [Code] [Model]

  • UniVLA: Learning to Act Anywhere with Task-Centric Latent Actions (2025).
    [arXiv] [Code]

  • AgiBot World Colosseo:
    A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
    (2025)
    [arXiv] [Code]


Datasets

Egocentric Human Manipulation Datasets

  • Ego4D (Meta) — large-scale egocentric video.
    [Website] [GitHub]
  • EPIC-KITCHENS — egocentric cooking activities and actions.
    [Website]
  • Ego-Exo4D — paired egocentric + exocentric.
    [Website]
  • HoloAssist — egocentric assistive tasks (AR/egocentric).
    [Website]
  • HOT3D — egocentric tasks with hand/object annotations (Meta).
    [GitHub]
  • HOI4D — 4D hand-object interaction dataset (includes egocentric views).
    [Website]
  • TACO — tool-use/action-centric dataset.
    [Website]

Human → Robot “Robotized Video” Datasets

  • H2R-1M — human videos with robot arms composited/rendered into scenes.
    [Dataset] [Paper]

Human Embodiment / Hand-Action Supervision Datasets

  • EgoDex dataset (paired with hand pose annotations; used for human-action pretraining).
    [Project] [Code]
  • EgoMimic sample datasets (human + robot episodes in robomimic HDF5 format).
    [Dataset] [Paper]
  • MotionTrans dataset (VR human tasks + robot tasks for cotraining).
    [Dataset] [Paper]

Humanoid Manipulation Benchmarks & Datasets

  • EgoVLA (Isaac Lab simulation benchmark for humanoid bimanual manipulation).
    [Benchmark] [Paper]

If you have real humanoid manipulation datasets (e.g., Unitree / H1 / G1 + dexterous hands, teleop logs, whole-body manipulation), please open a PR and add them here.