KohakuBlueleaf/HDM
Home Made Diffusion Models
[WIP] HDM - Home made Diffusion Models
HDM is a series of models that trained diffusion models (flow matching) from scratch with consumer level hardware in a reasonable cost.
HDM project targeting providing a small but usable base model that can be used for various tasks or perform as an experiment platform or even in practical applications.
Usage
ComfyUI
- Install this node: https://github.com/KohakuBlueleaf/HDM-ext
- Ensure the transformers library is >= 4.52
- if you install it from some manager, it should be handled automatically.
Installation
For local gradio UI or diffusers pipeline inference, you will need to install this repository into your python environment
- requirements:
- python>=3.10, python<3.13 (3.13 or higher may not work with pytorch)
- correct nvidia driver/cuda installed for triton to work.
- pytorch==2.7.x with triton 3.3.x
- or, pytorch==2.8.x with triton 3.4.x
- Optional requirements:
- TIPO(KGen): llama-cpp-python (may need custom built wheel)
- liger-kernel: For fused SwiGLU (with torch.compile will works as well)
- LyCORIS: For lycoris finetune
- Clone this repo
- Install this repo with following option
- fused: install xformers/liger-kernel for fused operation
- win: install triton-windows for torch.compile to work
- tipo: install tipo-kgen and llama.cpp for TIPO prompt gen
- lycoris: install lycoris for LyCORIS finetune.
- download model file
hdm-xut-340M-1024px-note.safetensorsto./modelsfolder - start the gradio app or check the diffusers pipeline inference script
git clone https://github.com/KohakuBlueleaf/HDM
cd HDM
python -m venv venv
source venv/bin/activate
# or venv\scripts\activate.ps1 for powershell
# You may want to install pytorch by yourself
# pip install -U torch torchvision xformers --index-url https://download.pytorch.org/whl/cu128
# use [..., win] if you are using windows, e.g. [fused,tipo,win]
# e.g: pip install -e .[fused,win]
pip install -e .You can use uv venv and uv pip install as well which will be way more efficient.
Gradio UI
Once you installed this library with correct dependencies and download the model to ./models folder.
Run following commands:
python ./scripts/inference_fm.py
Diffusers pipeline
hdm library provide a custom pipeline to utilize diffusers' pipeline model format:
import torch
import xut.env
# enable/disable different backend for XUT implementation
# With vanilla/xformers disabled, XUT will use pytorch SDPA attention kernel
xut.env.TORCH_COMPILE = True # torch.compile for unit module
xut.env.USE_LIGER = False # Use liger-kernel SwiGLU
xut.env.USE_VANILLA = False # Use vanilla attention
xut.env.USE_XFORMERS = True # Use xformers attention
xut.env.USE_XFORMERS_LAYERS = True # Use xformers SwiGLU
from hdm.pipeline import HDMXUTPipeline
pipeline = (
HDMXUTPipeline.from_pretrained(
"KBlueLeaf/HDM-xut-340M-anime", trust_remote_code=True
)
.to("cuda:0")
.to(torch.float16)
)
## Uncomment following line for torch.compile to work on "Whole backbone"
# pipeline.apply_compile(mode="default", dynamic=True)
images = pipeline(
# Prompts/negative prompts can be list or direct string
prompts=["1girl, dragon girl, kimono, masterpiece, newest"]*2,
negative_prompts="worst quality, low quality, old, early",
width=1024,
height=1440,
cfg_scale=3.0,
num_inference_steps=24,
# For camera_param and tread_gamma, check Tech Report for more information.
camera_param = {
"zoom": 1.0,
"x_shift": 0.0,
"y_shift": 0.0,
},
tread_gamma1 = 0.0,
tread_gamma2 = 0.5,
).imagesTraining/Finetuning
For both training and finetune you should use scripts/train.py script with correct toml config.
For example, you can refer config/train/hdm-xut-340M-ft.toml as example lycoris finetune config for HDM-xut-340M 1024px model.
You will need to download the corresponding training_ckpt or safetensors file from HuggingFace repo and fill the file path to model.model_path in the config file.
Then you can run following command:
python ./scripts/train.py <train toml config path>
About the dataset: For simplicity, hdm.data.kohya.KohyaDataset support the dataset format which supported by kohya-ss/sd-scripts, while the "repeat" functionality is not implemented yet.
Next Plan
- UNet-based Hires-Fix/Refiner model
- new arch specially designed for adaptive resolution text-guided latent refiner
- Use more general dataset (around 40M scale)
- Currently consider laion-coco-13m + gbc-10m + coyohd-11m + danbooru (total 40M)
- Will finetune from HDM-xut-340M 256px or 512px ckpt for testing this dataset
- Investigate the possibility to utilize MDM(Matryoshka Diffusion Models) technique.
- for example, current arch have best efficiecy in 512x512, not 1024x1024. But with MDM approach I can keep the 512px backbone but train some addon arch to make it 1024px.
- Pretrain a slightly larger model (see tech report, the XUT-large, ~550M scale model)
- Pretrain a slightly smaller model (see tech report, the XUT-small, ~230M scale model)
- Pixel space model
- PixNerd
- MDM
- others....
License
This project is still under developement, therefore all the models, source code, text, documents or any media in this project are licensed under CC-BY-NC-SA 4.0 until the finish of development.
For any usage that may require any kind of standalone, specialized license. Please directly contact kohaku@kblueleaf.net
Cite
@misc{HDM,
title={HDM: Improved UViT-like Architecture with a Special Recipe for Fast Pre-training on Consumer-Level Hardware},
author={Shin-Ying Yeh},
year={2025},
month={August},
howpublished=\url{https://github.com/KohakuBlueleaf/HDM},
}