KsanaDiT

High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation

📖 Introduction

KsanaDiT is a high-performance inference framework specifically designed for Diffusion Transformers (DiT), supporting video generation (T2V/I2V) and image generation (T2I) tasks. The framework provides a rich set of optimization techniques and flexible configuration options, enabling efficient execution of large-scale DiT models on single or multi-GPU environments.

✨ Key Features

🚀 High-Performance Inference: FP8 quantization, QKV Fuse, Torch Compile, and various attention optimizations
🎯 Multiple Attention Backends: SLA Attention, Flash Attention, Sage Attention, Radial Sage Attention, Torch SDPA
🎬 Multi-Modal Generation: Text-to-Video (T2V), Image-to-Video (I2V), Video Controllable Editing (Vace), Text-to-Image (T2I)
💾 Smart Caching: Built-in caching strategies (DBCache, EasyCache, MagCache, TeaCache, CustomStepCache, HybridCache)
🔧 Flexible Configuration: LoRA support, multiple samplers (Euler, UniPC, DPM++), custom sigma scheduling
🌐 Distributed Support: Single-GPU, multi-GPU (torchrun), Ray distributed inference, Model Pool management
🔌 ComfyUI Integration: ComfyUI node support (standalone submodule) for visual workflow design
🖥️ Multi-Platform Support: GPU, NPU, XPU (WIP)

📦 Supported Models

Video Generation Models

Model	Type	Parameters	Tasks	Status
Turbo Diffusion	Image-to-Video	14B	I2V	✅
Wan2.2-T2V	Text-to-Video	5B/14B	T2V	✅
Wan2.2-I2V	Image-to-Video	14B	I2V	✅
Wan2.1-Vace	Video Controllable Editing	14B	Vace	✅

Image Generation Models

Model	Type	Parameters	Tasks	Status
Qwen-Image	Text-to-Image	20B	T2I	✅
Qwen-Image Edit	Image Editing	20B	Image Edit	✅

🛠️ Installation

Docker

We are actively working on Dockerfiles. Stay tuned!

Requirements

Python: >= 3.10, < 4.0
PyTorch: >= 2.0
GPU Environment:
- CUDA >= 12.8
- Recommended: NVIDIA GPUs
NPU Environment:
- CANN >= 8.0
- torch_npu adapter

Basic Installation

# Clone the repository
git clone https://github.com/Tencent/KsanaDiT.git
cd KsanaDiT

# Install base dependencies (GPU version by default)
pip install -e .

GPU Accelerated Installation

# Install GPU optimization dependencies (recommended)
pip install -e ".[gpu]"

# Or install manually
pip install xformers>=0.0.29 flash-attn>=2.6.0 triton>=3.2.0

NPU Environment Installation

# 1. Install CANN toolkit (refer to official documentation)
# https://www.hiascend.com/software/cann

# 2. Install torch_npu
pip install torch-npu

# 3. Install KsanaDiT (NPU version)
pip install -e ".[npu]"

# 4. Verify NPU environment
python -c "import torch_npu; print(torch_npu.npu.is_available())"

Release Installation

Direct installation via wheel packages coming soon.

🔌 Interface Support

KsanaDiT provides multiple usage methods to meet different scenario requirements:

Local Pipeline Mode

Run locally through the Python Pipeline API, suitable for scripted batch generation or integration into your own systems:

from ksana import KsanaPipeline

# Create inference pipeline
pipeline = KsanaPipeline.from_models("path/to/model")

# Generate video/image
result = pipeline.generate(prompt, ...)

For detailed usage, refer to Quick Start and the examples directory.

ComfyUI Integration

KsanaDiT supports usage as ComfyUI custom nodes, providing a visual workflow experience:

# 1. Navigate to ComfyUI's custom_nodes directory
cd /path/to/ComfyUI/custom_nodes

# 2. Clone the KsanaDiT repository
git clone https://github.com/Tencent/KsanaDiT.git

# 3. Enter the KsanaDiT directory and install dependencies
cd KsanaDiT
./scripts/install_dev.sh

After installation, restart ComfyUI and you will see KsanaDiT-related nodes in the node list. For more ComfyUI usage instructions, refer to comfyui/README.md.

🚀 Quick Start

For detailed code examples, refer to examples.

Text-to-Video (T2V)

import torch
from ksana import KsanaPipeline
from ksana.config import (
    KsanaDistributedConfig,
    KsanaRuntimeConfig,
    KsanaSampleConfig,
)

# Create inference pipeline
pipeline = KsanaPipeline.from_models(
    "path/to/Wan2.2-T2V-A14B",
    dist_config=KsanaDistributedConfig(num_gpus=1)
)

# Generate video
video = pipeline.generate(
    "Street photography, cool girl with headphones skateboarding, New York streets, graffiti wall background",
    sample_config=KsanaSampleConfig(steps=40),
    runtime_config=KsanaRuntimeConfig(
        seed=1234,
        size=(720, 480),
        frame_num=17,
        return_frames=True,
    ),
)

print(f"Generated video shape: {video.shape}")

Image-to-Video (I2V)

from ksana import KsanaPipeline
from ksana.config import KsanaRuntimeConfig, KsanaSampleConfig

pipeline = KsanaPipeline.from_models("path/to/Wan2.2-I2V-A14B")

video = pipeline.generate(
    "Girl gently waves her fan, blows a breath of fairy air, lightning flies from her hand into the sky and thunder begins",
    start_img_path="input.png",
    sample_config=KsanaSampleConfig(steps=40),
    runtime_config=KsanaRuntimeConfig(
        seed=1234,
        size=(512, 512),
        frame_num=17,
    ),
)

Turbo Diffusion

See run_turbo_diffusion

Text-to-Image (T2I)

import torch
from ksana import KsanaPipeline
from ksana.config import (
    KsanaModelConfig,
    KsanaRuntimeConfig,
    KsanaSampleConfig,
    KsanaSolverType,
)

pipeline = KsanaPipeline.from_models(
    "path/to/Qwen-Image",
    model_config=KsanaModelConfig(run_dtype=torch.bfloat16),
)

image = pipeline.generate(
    "A cute orange cat sitting on a windowsill, sunlight streaming through the window onto its fur",
    sample_config=KsanaSampleConfig(
        steps=20,
        cfg_scale=4.0,
        solver=KsanaSolverType.FLOWMATCH_EULER,
    ),
    runtime_config=KsanaRuntimeConfig(
        seed=42,
        size=(1024, 1024),
    ),
)

🎯 Advanced Features

FP8 Quantized Inference

import torch
from ksana import KsanaPipeline
from ksana.config import (
    KsanaModelConfig,
    KsanaAttentionConfig,
    KsanaAttentionBackend,
    KsanaLinearBackend,
)

model_config = KsanaModelConfig(
    run_dtype=torch.float16,
    attention_config=KsanaAttentionConfig(backend=KsanaAttentionBackend.SAGE_ATTN),
    linear_backend=KsanaLinearBackend.FP8_GEMM,
)

pipeline = KsanaPipeline.from_models(
    ("high_noise_fp8.safetensors", "low_noise_fp8.safetensors"),
    model_config=model_config,
)

LoRA Accelerated Inference

from ksana import KsanaPipeline
from ksana.config import KsanaLoraConfig, KsanaSampleConfig

pipeline = KsanaPipeline.from_models(
    "path/to/Wan2.2-T2V-A14B",
    lora_config=KsanaLoraConfig("path/to/Wan2.2-Lightning-4steps-lora"),
)

# Fast generation with 4 steps
video = pipeline.generate(
    prompt,
    sample_config=KsanaSampleConfig(
        steps=4,
        cfg_scale=1.0,
        sigmas=[1.0, 0.9375, 0.6333, 0.225, 0.0],
    ),
)

Smart Cache Optimization - Under Active Development

from ksana.config.cache_config import (
    DCacheConfig,
    DBCacheConfig,
    KsanaHybridCacheConfig,
)

# Use hybrid caching strategy
cache_config = KsanaHybridCacheConfig(
    step_cache=DCacheConfig(fast_degree=50),
    block_cache=DBCacheConfig(),
)

video = pipeline.generate(
    prompt,
    cache_config=cache_config,
)

Multi-GPU Distributed Inference

# Method 1: Using CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=0,1,2,3 python your_script.py

# Method 2: Using torchrun
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 your_script.py

from ksana import KsanaPipeline
from ksana.config import KsanaDistributedConfig

pipeline = KsanaPipeline.from_models(
    model_path,
    dist_config=KsanaDistributedConfig(num_gpus=4),
)

📊 Performance Optimization Techniques

Quantization & Compute Optimization

Technique	Description	Effect
FP8 GEMM	FP8 quantized matrix multiplication	Reduced memory, improved speed
Torchao FP8 Dynamic	Dynamic FP8 quantization	Adaptive precision, balanced quality and performance
QKV Fuse	QKV projection fusion	Reduced memory access, improved throughput
torch.compile	Graph compilation optimization	10-30% end-to-end speedup

Attention Backends

Backend	Characteristics	Use Case
Flash Attention	High performance, memory efficient	General recommendation
Sage Attention	Optimized attention computation	Long sequences
Radial Sage Attention	Radial sparse attention	Very long sequences
Torch SDPA	PyTorch native implementation	Compatibility priority

Caching Strategies

Strategy	Description	Use Case
TeaCache	Temporal-aware step-level caching	Video generation optimization
MagCache	Adaptive step-level caching	Balanced quality and speed
EasyCache	Lightweight step-level caching without pre-prepared parameters	Fast inference with minimal overhead
DBCache	Block-level caching	Image generation
HybridCache	Step-level + block-level hybrid caching	Maximum acceleration

Samplers

Sampler	Description	Use Case
Euler	Fast sampling	4-8 step inference
UniPC	High-quality sampling	20-40 step inference
DPM++	Efficient multi-step sampling	General purpose
Turbo Diffusion	Ultra-fast sampling	4-step inference
FlowMatch Euler	Flow matching sampling	Image generation

🔧 Configuration

Environment Variables

# Log level: debug/info/warn/error
export KSANA_LOGGER_LEVEL=info

Model Configuration

The framework supports model parameter configuration via YAML files, located in the ksana/settings/ directory:

qwen/t2i_20b.yaml - Qwen image generation model config
qwen/edit_20b.yaml - Qwen image editing model config
wan/t2v_14b.yaml - Wan2.2 T2V model config
wan/i2v_14b.yaml - Wan2.2 I2V model config
wan/vace_14b.yaml - Wan2.1 Vace model config

📚 Code Examples

Complete example code is available in the examples/ directory:

examples/wan/wan2_2_t2v.py - Text-to-Video example
examples/wan/wan2_2_i2v.py - Image-to-Video example
examples/wan/wan2_1_vace.py - Video controllable editing example
examples/qwen/qwen_image_t2i.py - Text-to-Image example
examples/qwen/qwen_image_edit.py - Image Editing example

🧪 Testing

We have comprehensive test coverage. Tests are currently time-consuming; we will continue to streamline them. For developers only.

# Run all tests
pytest tests/

# Run specific tests
pytest tests/ksana/pipelines/wan2_2_t2v_test.py

# Run GPU tests
bash scripts/ci_tests/ci_ksana_gpus.sh

🤝 Contributing

We welcome community contributions! Before submitting a PR, please ensure:

Code passes all tests
Follows project code style (using black and ruff)
Includes necessary documentation and comments
Updates relevant README and examples

# Install development dependencies
pip install -e ".[dev]"

# Run code style checks
pre-commit run --all-files

# Run tests
pytest tests/

📋 Changelog

For a detailed list of changes in each version, see the CHANGELOG.

📄 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

This project benefits from the following excellent open-source projects:

Wan-Video - Wan2.2 video generation model
ComfyUI-WanVideoWrapper - ComfyUI integration reference
FastVideo - Video generation optimization techniques
Nunchaku - Quantization optimization solutions
TurboDiffusion - Inference acceleration solutions

📮 Contact

Bug Reports: GitHub Issues
Feature Requests: GitHub Discussions

🗺️ Roadmap

Completed ✅

In Progress 🚧

Support for more generation models (Z-Image, Hunyuan, etc.)
Memory optimization for longer video generation
Cache strategy performance tuning
Model quantization toolchain
XPU full feature support optimization

If this project helps you, please give us a ⭐️ Star!

Made with ❤️ by the KsanaDiT Team

Tencent/KsanaDiT