Masked Modeling for Human Motion Recovery Under Occlusions

Project Page | Paper

Masked Modeling for Human Motion Recovery Under Occlusions
Zhiyin Qian,
Siwei Zhang,
Bharat Lal Bhatnagar,
Federica Bogo,
Siyu Tang
3DV 2026

Installation

git clone https://github.com/mikeqzy/MoRo
conda env create -f environment.yml
conda activate moro

Data preparation

SMPL(-X) body model

We use smplfitter to fit the non-parametric mesh to SMPL-X parameters. Please follow their provided script to download these files and put them under body_models.

Additionally, you can download the mesh connection matrices for SMPL-X topology used for the fully convolutional mesh autoencoder in Mesh-VQ-VAE and other regressors for evaluation here. Please also extract and put them under body_models .

Tokenization

We train the tokenizer on AMASS, MOYO and BEDLAM. Download the SMPL-X neutral annotations from their official project pages and unzip the files.

We preprocessed the datasets with the scripts at models/mesh_vq_vae/data/preprocess. Please change the dataset paths accordingly.

MoRo

MoRo is trained on a mixture of datasets prepared as follows:

AMASS:

We preprocess AMASS into 30 fps sequences following RoHM, please refer to the instructions here.
MPII, Human3.6M, MPI-INF-3DHP, COCO:

The training dataset uses the SMPL annotations from BEDLAM. Follow the instructions here in the section Training CLIFF model with real images to obtain the required training images and annotations.

We further convert the SMPL annotations to SMPL-X using the provided script at scripts/preprocess/process_hmr_smplx.py.
BEDLAM:

Download the BEDLAM dataset from their official project page. We use the SMPL-X neutral annotations for training.
EgoBody:

First, download the EgoBody dataset from the official EgoBody dataset.

Additionally, download keypoints_cleaned, mask_joint from here and egobody_occ_info.csv from here, then place them under the dataset directory.

Finally, run the provided preprocessing script at scripts/preprocess/process_egobody_bbox.py to generate the bounding box files.

Testing data

Additionally, we test our method on PROX and RICH. Follow the dataset-specific steps below to place files in the expected locations.

RICH:

Download the RICH dataset from their official project page. The preprocessed annotations can be downloaded from GVHMR repo. Put the hmr4d_support folder under the RICH dataset directory.
PROX:

First, download the PROX dataset from their official project page.

Additionally, download keypoints_openpose and mask_joint from here and place them under the dataset directory.

Finally, run the provided preprocessing script at scripts/preprocess/process_prox_bbox.py to generate the bounding box files.

Checkpoints

Make sure the required pretrained weights and released checkpoints are placed at the exact paths below.

The checkpoint for Mesh-VQ-VAE and MoRo can be downloaded here. Place the tokenizer checkpoint at ckpt/tokenizer/tokenizer.ckpt and MoRo checkpoint at exp/mask_transformer/MIMO-vit-release/video_train/checkpoints/last.ckpt.

To train from scratch, we use the pretrained weights from 4DHumans (download model from the Training section of official repo) for the ViT backbone. Place it at ckpt/backbones/vit_pose_hmr2.pth.

Structure

The data should be organized as follows:

MoRo
├── body_models
|   ├── smpl
|   ├── smplh
│   ├── smplx
│   ├── smplx_ConnectionMatrices
|   ├── J_regressor_h36m.npy
|   ├── smpl_neutral_J_regressor.pt
|   ├── smplx2smpl_sparse.pt
├── ckpt
│   ├── backbones
│   │   ├── vit_pose_hmr2.pth
│   ├── tokenizer
│   │   ├── tokenizer.ckpt
├── exp
│   ├── mask_transformer
│   │   ├── MIMO-vit-release
│   │   │   ├── video_train
│   │   │   │   ├── checkpoints
│   │   │   │   │   ├── last.ckpt
├── datasets
│   ├── mesh_vq_vae
│   │   ├── bedlam_animations
│   │   ├── AMASS_smplx
│   │   ├── MOYO
│   ├── mask_transformer
│   │   ├── AMASS
│   │   ├── BEDLAM
│   │   ├── coco
│   │   ├── h36m_train
│   │   ├── mpi-inf-3dhp
│   │   ├── mpii
│   │   ├── EgoBody
│   │   ├── PROX
│   │   ├── rich

Demo

For a quick demo on custom video taken from static camera, run the following command on a 30 fps video:

python demo.py option=demo demo.video_path=/path/to/demo.mp4 demo.name=<video_name> demo.focal_length=<focal_length>

or an image directory with sorted frames:

python demo.py demo.video_path=/path/to/image_dir demo.name=<video_name> demo.focal_length=<focal_length>

By default, the rendering result will be saved to ./exp/mask_transformer/MIMO-vit-release/video_train, under the same directory of the released model checkpoint.

The focal length can be optionally provided. If not provided, it will be estimated via HumanFOV from CameraHMR.

Training

Training consists of (1) training the mesh tokenizer and (2) multi-stage training for MoRo. Configuration locations are listed in each subsection.

Tokenization

You can train the mesh tokenizer by running:

python train_mesh_vqvae.py

The configuration file is at configs/mesh_vq_vae/config.yaml.

MoRo

We adopt a multi-stage training strategy for MoRo:

# Stage 1: Pose pretraining
python train_mask_transformer.py option=pose_pretrain tag=default
# Stage 2: Motion pretraining
python train_mask_transformer.py option=motion_pretrain tag=default
# Stage 3: Image pretraining on image datasets
python train_mask_transformer.py option=image_pretrain tag=default
# Stage 4: Image pretraining on video datasets
python train_mask_transformer.py option=video_pretrain tag=default
# Stage 5: Finetuning on video datasets
python train_mask_transformer.py option=video_train tag=default

The configuration files are at configs/mask_transformer/config.yaml, the specific options for each training stage can be found in configs/option.

The training logs and checkpoints will be saved under exp/mask_transformer/MIMO-vit-<tag>/<stage>.

Testing and Evaluation

Inference writes results to exp/mask_transformer/MIMO-vit-<tag>/video_train, and then evaluation scripts read from the corresponding result directories.

We set tag=release here to reproduce the results reported on the paper.

EgoBody

python train_mask_transformer.py option=[inference,video_train] tag=release data=egobody
python eval_egobody.py --saved_data_dir=./exp/mask_transformer/MIMO-vit-release/video_train/result_egobody/inference_5_1 --recording_name=all --render

RICH

python train_mask_transformer.py option=[inference,video_train] tag=release data=rich
python eval_rich.py --saved_data_dir=./exp/mask_transformer/MIMO-vit-release/video_train/result_rich/inference_5_1 --seq_name=all --render

PROX

python train_mask_transformer.py option=[inference,video_train] tag=release data=prox
python eval_egobody.py --saved_data_dir=./exp/mask_transformer/MIMO-vit-release/video_train/result_prox/inference_5_1 --recording_name=all --render

Acknowledgements

This work was supported as part of the Swiss AI initiative by a grant from the Swiss National Supercomputing Centre (CSCS) under project IDs #36 on Alps, enabling large-scale training.

Some code in this repository is adapted from the following repositories:

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{qian2026moro,
  title={Masked Modeling for Human Motion Recovery Under Occlusions},
  author={Qian, Zhiyin and Zhang, Siwei and Bhatnagar, Bharat Lal and Bogo, Federica and Tang, Siyu},
  booktitle={3DV},
  year={2026}
}

AYLARDJ/MoRo

Masked Modeling for Human Motion Recovery Under Occlusions

Project Page | Paper

Installation

Data preparation

SMPL(-X) body model

Tokenization

MoRo

Testing data

Checkpoints

Structure

Demo

Training

Tokenization

MoRo

Testing and Evaluation

EgoBody

RICH

PROX

Acknowledgements

Citation

On this page

Languages

Contributors