Zhoues/MineDreamer
[IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "
MineDreamer : Learning to Follow Instructions via
Chain-of-Imagination for Simulated-World Control
๐ฅฐ If you are interested in our work, feel free to star โญ or watch ๐ our repo for the latest updates๐ค!!
๐ฅ Updates
[2025-06-16] ๐ฅ๐ฅ๐ฅ MineDreamer (a.k.a. DecisionDreamer) is selected as Oral Presentation in IROS 2025!
[2024-04-03] MineDreamer code is released. Let's enjoy the Imagination ability of the embodied agent!
[2024-03-19] MineDreamer is released on arxiv.
[2024-03-15] The Project page is set up at here.
๐ Try MineDreamer
The code and checkpoints are released and the open-source contents include the following:
-
โ MineDreamer agent and Baseline Code (i.e., VPT, STEVE-1, Multi-Modal Memory)
-
โ MineDreamer Goal Drift Dataset and MineDreamer weights, including MineDreamer-7B of Imaginator and Prompt Generator.
-
โ MineDreamer Training Scripts, including The Imaginator training stages 2 and 3.
-
Note: For Imaginator training stage 1, we only provide pre-trained Q-Former weights. For Prompt Generator, we only provide the weights and if you want to train your own Prompt Generator, please refer to STEVE-1 to collect data and train it.
Directory Structure:
.
โโโ README.md
โโโ minedreamer
โย ย โโโ All agent code, including baseline and MineDreamer.
โโโ imaginator
โย ย โโโ All imaginator code including training and inference.
โย
โโโ play: Scripts for running the agent for all evaluations.
โย ย โโโ programmatic: run the inference code of Programmatic Evaluation
โย ย โ
โย ย โโโ chaining: run the inference code of Command-Switching Evaluation
โย
โโโ scripts
โย ย โโโ Scripts for training and inference of Imaginator.
โย ย
โโโ download_baseline_weights.sh: download baseline weights.
โย ย
โโโ download_minedreamer_weights.sh: download minedreamer and other pre-trained weights for Imaginator training.
Model Zoo and Dataset
We provide MineDreamer models for you to play with, including all three training stages checkpoints, and datasets. You can be downloaded from the following links:
| model | training stage | size | HF weights๐ค | HF dataset ๐ค |
|---|---|---|---|---|
| Pre-trained Q-Former | 1 | 261MB | Pretrained-QFormer | |
| InstructPix2Pix U-Net | 2 | 3.44GB | InstructPix2Pix-Unet | Goal-Drift-Dataset |
| MineDreamer-Imaginator-7B | 3 | 17.7GB | MineDreamer-7B | Goal-Drift-Dataset |
Step 1: Install MineRL Env and Run Baseline
It's worth noting that if you wish only to train or test the Imaginator, you can skip Step 1.
-
We provide two methods for installing the MineRL environment. Detailed instructions can be found in this repo. Please ensure you complete the final test, otherwise the Agent will not function correctly.
-
Download the weights (Baseline weights + Prompt Generator weights):
sh download_baseline_weights.sh -
Run Baseline. If you use cluster like slurm, replace
sudowithsrun -p <your virtual partition> --gres=gpu:1.# If you use the Normal Installation Procedure to install MineRL Env and the server is headful sh play/programmatic/steve1_play_w_text_prompt.sh mine_block_wood # If you use the Normal Installation Procedure to install MineRL Env and the server is headless sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood # If you use the container to install MineRL Env sudo apptainer exec -w --nv --bind /path/to/MineDreamer:/path/to/MineDreamer vgl-env sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood # If you use the container to install MineRL Env and run by GPU rendering sudo apptainer exec -w --nv --bind /path/to/MineDreamer:/path/to/MineDreamer vgl-env bash setupvgl.sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood
Then, you will see in
data/playthe intermediate processes and the final video of Agent acting according to the instructions.
Step 2: Install Imaginator Env and Run MineDreamer Agent
This codebase has strict environmental requirements; we recommend you follow the tutorial below step by step.
- We recommend running on Linux using a conda environment, with python 3.9:
conda create -n imaginator python=3.9. - Install pytorch for cuda-118:
pip install --pre torch==2.2.0.dev20231010+cu118 torchvision==0.17.0.dev20231010+cu118 torchaudio==2.2.0.dev20231010+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118- Note: The version of the torch may change over time. If you encounter an error that means the following version does not exist, please change the right version by using the error information.
- Install additional packages:
pip install -r requirements.txt - Install DeepSpeed:
DS_BUILD_AIO=1 DS_BUILD_FUSED_LAMB=1 pip install deepspeed- Note: This step often fails due to the requirement of specific versions of CUDA and GCC. It is expected that
cuda118andgcc-7.5.0are used. To ensure error-free script execution in the future, the commands to activate these versions should be added to the~/.bashrcfile. Below is a reference for the content to be included in the~/.bashrc:
Upon installation, you can enter... export LD_LIBRARY_PATH=/mnt/petrelfs/share/gcc/gcc-7.5.0/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export PATH=/mnt/petrelfs/share/gcc/gcc-7.5.0/bin:$PATH export PATH=/mnt/petrelfs/share/cuda-11.8/bin:$PATH export LD_LIBRARY_PATH=/mnt/petrelfs/share/cuda-11.8/lib64:$LD_LIBRARY_PAT ...ds_report. If the output appears as shown below, it indicates the installation is correct:fused_adam ............. [YES] ...... [OKAY]
- Note: This step often fails due to the requirement of specific versions of CUDA and GCC. It is expected that
- Download the weights (Imaginator weights + pre-trained weights for training):
sh download_minedreamer_weights.shand remove the original LoRA parameters from Huggingface's LLaVA with:bash scripts/pre_llava.sh. - Try inferencing the Imaginator and (InstructPix2Pix). You can find generated images in
inference_valid_*folder.# InstructPix2Pix bash scripts/inference_IP2P.sh # Imaginator bash scripts/inference_MineDreamer.sh
- To run the MineDreamer agent, first you need to launch the backend service of Imaginator.
At this point, you'll receive a backend IP address similar to# InstructPix2Pix bash scripts/minedreamer_backend_IP2P.sh # Imaginator bash scripts/minedreamer_backend_MLLMSD.shRunning on http://10.140.1.104:25547 (Press CTRL+C to quit). Then, you should insert this IP address into thedreamer_urlfield within theminedreamer/play/config/programmatic/mine_block_wood.yamlfile, similar to:dreamer_url: http://10.140.1.104:25547/ - Run the MineDreamer Agent. The process is consistent with running the baseline in Step 1, but this time you should execute the
*_dreamer_play_w_text_prompt.shscript.
Step 3: Train your own Imaginator
- First, download the Goal Drift Dataset and place it in the
data/mllm_diffusion_datasetdirectory and unzip it. - To train the Unet parameters of InstructPix2Pix, execute:
bash scripts/train_InstructPix2Pix_minecraft.sh. This checkpoint can also be used as baseline. - Train Imaginator-7B by running:
bash scripts/train_MineDreamer.sh.
๐ถ๏ธOverview
The Overview of Chain-of-Imagination within MineDreamer
The Overview Framework of Imaginator within MineDreamer
๐น Demo video and Imagination Visual Results
More demo videos and Imagination visual results are on our project webpage.
Imagination Visual Results on Evaluation Set Compared to the Baseline
Imagination Visual Results During Agent Solving Open-ended Tasks
Building a more generalist embodied agent
A generalist embodied agent should have a high-level planner capable of perception and planning in an open world, as well as a low-level controller able to act in complex environments. The MineDreamer agent can steadily follow short-horizon text instructions, making it suitable as a low-level controller for generating control signals. For high-level planner, including perception and task planning in an open world, one can look to the methods presented in CVPR2024's MP5, whose code is also released! It is adept at planning for tasks that require long-horizon sequencing and extensive environmental awareness. Therefore, combining MP5 with MineDreamer presents a promising approach to developing more generalist embodied agents.
Acknowledgment
This repository is built upon the codebase of LLaVA, STEVE-1 and SmartEdit.
๐ Citation
If you find MineDreamer and MP5 useful for your research and applications, please cite using this BibTeX:
@article{zhou2024minedreamer,
title={MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control},
author={Zhou, Enshen and Qin, Yiran and Yin, Zhenfei and Huang, Yuzhou and Zhang, Ruimao and Sheng, Lu and Qiao, Yu and Shao, Jing},
journal={arXiv preprint arXiv:2403.12037},
year={2024}
}
@inproceedings{qin2024mp5,
title={MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception},
author={Qin, Yiran and Zhou, Enshen and Liu, Qichang and Yin, Zhenfei and Sheng, Lu and Zhang, Ruimao and Qiao, Yu and Shao, Jing},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={16307--16316},
year={2024}
}




