KohakuBlueleaf/KohakuRiver
A lightweight cluster manager that turns your small fleet of nodes into one powerful computer, using Docker for environment consistency without the overhead of enterprise orchestration systems.
KohakuRiver
Self-hosted cluster manager for small teams and research labs.
Distribute tasks and persistent VPS sessions across compute nodes with
Docker containers, QEMU/KVM VMs, VXLAN overlay networking, and GPU passthrough.
Documentation · Quick Start · CLI Reference
Overview
Small teams with 3-20 compute nodes face an awkward middle ground — too many machines to manage with SSH scripts, too few to justify Slurm or Kubernetes. KohakuRiver treats your cluster as one big computer: submit a command or launch a VPS, and it runs on the right node with the right resources.
Docker in KohakuRiver functions as a portable virtual environment. Set up once, package as a tarball, and every node has the same environment automatically.
Key Capabilities
- Command tasks — one-shot batch execution with stdout/stderr capture
- VPS sessions — persistent interactive environments with SSH, terminal, and port forwarding
- Dual backends — Docker containers for lightweight tasks, QEMU/KVM VMs for full hardware isolation
- GPU support — NVIDIA Container Toolkit for Docker, VFIO passthrough for VMs
- Overlay networking — VXLAN L3 hub topology, containers on different nodes communicate directly
- Tunnel system — WebSocket-based terminal and port forwarding without Docker port mapping
- Web dashboard — Vue.js UI for cluster management, monitoring, and terminal access
- Terminal TUI — full-screen dashboard and IDE mode with file tree and integrated terminal
- Auth system — roles (admin/operator/user), API tokens, invitation-based registration
Screenshots
Web Dashboard
|
Cluster Overview |
Node Monitoring |
|
Task Management |
Task Create |
VPS Management
|
VPS Create
|
Terminal TUI
|
TUI Dashboard |
IDE Mode |
Architecture
┌──────────┐ ┌──────────────────────┐
│ CLI │ │ Web Dashboard │
└────┬─────┘ └──────────┬───────────┘
│ │
▼ ▼
┌────────────────────────────────────────────────────────────┐
│ Host Server (:8000) │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ FastAPI │ │ Task │ │ Overlay │ │ SSH Proxy │ │
│ │ API │ │ Scheduler │ │ Manager │ │ (:8002) │ │
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
│ ┌────────────┐ ┌────────────────────────────────────┐ │
│ │ Auth │ │ SQLite DB (Peewee ORM) │ │
│ │ Service │ │ tasks, nodes, users, auth │ │
│ └────────────┘ └────────────────────────────────────┘ │
└────────────────────────────┬───────────────────────────────┘
│ HTTP + VXLAN
┌────────────────┴────────────────┐
│ │
┌───────────▼────────────────┐ ┌─────────────▼──────────────┐
│ Runner Node A (:8001) │ │ Runner Node B (:8001) │
│ │ │ │
│ ┌──────────────────────┐ │ │ ┌──────────────────────┐ │
│ │ Runner Agent │ │ │ │ Runner Agent │ │
│ │ (FastAPI) │ │ │ │ (FastAPI) │ │
│ └──────────────────────┘ │ │ └──────────────────────┘ │
│ │ │ │
│ ┌──────────┐ ┌──────────┐ │ │ ┌──────────┐ ┌──────────┐ │
│ │ Docker │ │ Tunnel │ │ │ │ Docker │ │ QEMU │ │
│ │ Engine │ │ Server │ │ │ │ Engine │ │ /KVM │ │
│ └──────────┘ └──────────┘ │ │ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ │ │ ┌──────────┐ ┌──────────┐ │
│ │ VPS │ │ VXLAN │ │ │ │ Tunnel │ │ VXLAN │ │
│ │ Manager │ │ Agent │ │ │ │ Server │ │ Agent │ │
│ └──────────┘ └──────────┘ │ │ └──────────┘ └──────────┘ │
│ │ │ │
│ ┌──────┐ ┌──────┐ │ │ ┌──────┐ ┌──────┐ │
│ │VPS 1 │ │VPS 2 │ │ │ │VPS 3 │ │ VM 1 │ │
│ └──────┘ └──────┘ │ │ └──────┘ └──────┘ │
└────────────────────────────┘ └────────────────────────────┘
│
┌────────────────▼────────────────┐
│ Shared Storage (optional) │
│ NFS / Samba / SSHFS │
└─────────────────────────────────┘
| Tier | Role |
|---|---|
| Host (:8000) | Central control plane. Task scheduling, node management, overlay hub, SSH/WebSocket proxy, SQLite database. |
| Runners (:8001) | Compute node agents. Execute tasks in Docker/QEMU, monitor resources, run tunnel server, manage overlay agent. |
| Containers / VMs | Workloads. Docker for lightweight tasks, QEMU/KVM for full isolation with GPU passthrough. |
| Shared Storage | (Optional) NFS/Samba/SSHFS. Simplifies tarball distribution. Path can differ per node. Not required with registry images or VMs. |
Quick Start
Prerequisites
- Python >= 3.10
- Docker Engine on host and runner nodes
- (Optional) Shared filesystem, NVIDIA drivers + Container Toolkit, QEMU/KVM + IOMMU
Install
git clone https://github.com/KohakuBlueleaf/KohakuRiver.git
cd KohakuRiver
pip install .
# With GPU monitoring
pip install ".[gpu]"Configure
kohakuriver init config --generateEdit ~/.kohakuriver/host_config.py:
HOST_REACHABLE_ADDRESS = "192.168.1.100" # IP that runners can reach
SHARED_DIR = "/mnt/cluster-share" # shared storage path (optional)Edit ~/.kohakuriver/runner_config.py:
HOST_ADDRESS = "192.168.1.100" # host IP
SHARED_DIR = "/mnt/cluster-share" # same shared storageStart
# On the host machine
kohakuriver.host
# On each runner node
kohakuriver.runnerFor production, use systemd:
kohakuriver init service --host # on host
kohakuriver init service --runner # on runners
sudo systemctl enable --now kohakuriver-host
sudo systemctl enable --now kohakuriver-runnerFirst Task
kohakuriver task submit -t mynode -- echo "Hello from the cluster!"
kohakuriver task logs <task_id>First VPS
kohakuriver vps create -t mynode -c 4 -m 8G --ssh
kohakuriver vps connect <task_id> # SSH
kohakuriver connect <task_id> # WebSocket terminal
kohakuriver connect <task_id> --ide # TUI IDEFeatures
Container as Portable Environment
Create a Docker container on the host, customize it interactively, package it as a tarball. Every runner automatically syncs it. Or pull directly from any Docker registry.
kohakuriver docker container create python:3.12-slim my-env
kohakuriver docker container shell my-env # customize interactively
kohakuriver docker tar create my-env # package for distributionResource Management
# 4 cores, 8GB memory, GPUs 0 and 1 on node "alpha"
kohakuriver task submit -t alpha::0,1 -c 4 -m 8G --container my-env -- python train.py- CPU — core count with pinning, NUMA binding (
-t node:numa_id) - Memory — per-task limits (
-m 8G) - GPU — target by index (
-t node::0,1), NVIDIA Container Toolkit for Docker, VFIO for VMs - Multi-target — submit to multiple nodes/GPUs in one command
QEMU/KVM Virtual Machines
Full VMs with VFIO GPU passthrough. Cloud-init provisions SSH keys, networking, and NVIDIA drivers automatically.
kohakuriver qemu check # discover capabilities
kohakuriver vps create --backend qemu -t mynode::0 --vm-memory 16384 -c 8 --sshOverlay Network
VXLAN L3 hub topology with host as central router. Set OVERLAY_ENABLED=True and open UDP 4789 — tunnel setup, subnet allocation, IP reservation, routing, and firewall rules are handled automatically.
Host (10.128.0.1/12)
│
┌─────────┴──────────┐
VXLAN VNI=101 VXLAN VNI=102
│ │
Runner 1 Runner 2
10.128.64.0/18 10.128.128.0/18
Supports up to 63 runners with ~16,380 IPs each (configurable).
Access Without Port Mapping
A Rust tunnel client (Tokio + Tungstenite) inside containers enables access through WebSocket tunnels:
kohakuriver connect <task_id> # full TTY terminal (vim, htop)
kohakuriver connect <task_id> --ide # TUI IDE with file tree
kohakuriver forward <task_id> 8888 # port forwarding (Jupyter, etc.)
kohakuriver forward <task_id> 5353 --proto udp
kohakuriver vps connect <task_id> # SSH through host proxyNo port conflicts. Works through firewalls. Survives container restarts.
Snapshots
Docker VPS instances support snapshots — auto-snapshot on stop, manual creation, restore on restart, configurable retention (default: 3 per VPS).
Authentication
Session-based and token-based auth with role hierarchy (admin, operator, user). Invitation-based registration. API tokens for CLI and automation.
Web Dashboard
The Vue.js frontend provides cluster overview, task/VPS management, Docker environment management, web terminal (xterm.js), GPU monitoring (Plotly.js), and admin panel.
cd src/kohakuriver-manager
npm install
npm run dev # port 5173
npm run build # production buildCLI Reference
Tasks
kohakuriver task submit [OPTIONS] -- CMD # submit a command task
kohakuriver task list # list tasks
kohakuriver task status <id> # detailed status
kohakuriver task logs <id> # stdout (--stderr, --follow)
kohakuriver task kill <id> # kill running task
kohakuriver task pause <id> # pause
kohakuriver task resume <id> # resume
kohakuriver task watch <id> # live monitorSubmit options
| Flag | Description |
|---|---|
-t, --target NODE[::GPU,GPU] |
Target node, optional NUMA (:numa) and GPU (::0,1) |
-c, --cores N |
CPU cores (0 = no limit) |
-m, --memory SIZE |
Memory limit (e.g., 8G, 512M) |
--container NAME |
Container environment name |
--image NAME |
Docker registry image (alternative to tarball) |
--privileged |
Run with Docker --privileged |
--mount SRC:DST |
Additional bind mounts |
--wait |
Wait for task completion |
VPS
kohakuriver vps create [OPTIONS] # create VPS
kohakuriver vps list # list instances (--all for stopped)
kohakuriver vps status <id> # detailed status
kohakuriver vps stop <id> # stop (preserves state)
kohakuriver vps restart <id> # restart
kohakuriver vps pause <id> / resume <id> # pause / resume
kohakuriver vps connect <id> # SSH via proxyVPS options
| Flag | Description |
|---|---|
--backend docker|qemu |
Workload backend (default: docker) |
--ssh |
Enable SSH access |
--gen-ssh-key |
Generate SSH keypair |
--public-key-file PATH |
SSH public key file |
--vm-memory MB |
VM RAM in MB (QEMU only, default: 4096) |
--vm-disk GB |
VM disk in GB (QEMU only, default: 20) |
--vm-image NAME |
Base VM image (QEMU only, default: ubuntu-22.04) |
Access & Terminal
kohakuriver connect <id> # WebSocket terminal (full TTY)
kohakuriver connect <id> --ide # TUI IDE with file tree
kohakuriver forward <id> <port> # port forwarding
kohakuriver forward <id> <port> -l <local> # custom local port
kohakuriver forward <id> <port> --proto udp # UDP forwarding
kohakuriver terminal # TUI dashboardNodes
kohakuriver node list # list nodes (--status online|offline)
kohakuriver node status <hostname> # node details
kohakuriver node health [hostname] # health metrics
kohakuriver node summary # cluster summary
kohakuriver node overlay # overlay network statusDocker
kohakuriver docker container list # list host containers
kohakuriver docker container create IMG NAME # create from image
kohakuriver docker container shell NAME # interactive shell
kohakuriver docker container start NAME # start
kohakuriver docker container stop NAME # stop
kohakuriver docker container delete NAME # delete
kohakuriver docker tar list # list tarballs
kohakuriver docker tar create NAME # create tarball
kohakuriver docker tar delete NAME # delete tarball
kohakuriver docker images # list images on runnersQEMU/KVM, Auth, SSH, Setup
QEMU/KVM
kohakuriver qemu check # verify KVM, IOMMU, VFIO GPUs
kohakuriver qemu image create # create base VM image
kohakuriver qemu image list # list base images
kohakuriver qemu instances # list VM instances
kohakuriver qemu cleanup <id> # delete VM instance
kohakuriver qemu acs-override # apply ACS overrideAuthentication
kohakuriver auth login # login
kohakuriver auth logout # logout (--revoke to revoke token)
kohakuriver auth status # current auth status
kohakuriver auth token list # list API tokens
kohakuriver auth token create <name> # create token
kohakuriver auth token revoke <id> # revoke tokenSSH
kohakuriver ssh connect <id> # SSH to VPS via proxy
kohakuriver ssh config # generate SSH config for all VPSSetup
kohakuriver init config --generate # generate config files
kohakuriver init service --host # create host systemd service
kohakuriver init service --runner # create runner systemd service
kohakuriver config show # show current configConfiguration
Configuration files are Python modules in ~/.kohakuriver/. Generate with kohakuriver init config --generate.
Host settings (~/.kohakuriver/host_config.py)
| Setting | Default | Description |
|---|---|---|
HOST_REACHABLE_ADDRESS |
"127.0.0.1" |
IP/hostname runners and clients use to reach the host |
HOST_PORT |
8000 |
API server port |
HOST_SSH_PROXY_PORT |
8002 |
SSH proxy port |
SHARED_DIR |
"/mnt/cluster-share" |
Shared storage path |
DB_FILE |
"cluster_management.db" |
SQLite database path |
OVERLAY_ENABLED |
False |
Enable VXLAN overlay networking |
DEFAULT_CONTAINER_NAME |
"kohakuriver-base" |
Default container environment |
HEARTBEAT_INTERVAL_SECONDS |
5 |
Runner heartbeat interval |
HEARTBEAT_TIMEOUT_FACTOR |
6 |
Missed heartbeats before offline |
TASKS_PRIVILEGED |
False |
Run containers with --privileged |
Runner settings (~/.kohakuriver/runner_config.py)
| Setting | Default | Description |
|---|---|---|
HOST_ADDRESS |
"127.0.0.1" |
Host server address |
HOST_PORT |
8000 |
Host server port |
SHARED_DIR |
"/mnt/cluster-share" |
Shared storage path (can differ from host) |
LOCAL_TEMP_DIR |
"/tmp/kohakuriver" |
Local temporary storage |
OVERLAY_ENABLED |
False |
Enable VXLAN overlay networking |
TUNNEL_ENABLED |
True |
Enable tunnel server |
VM_IMAGES_DIR |
"~/.kohakuriver/vm-images" |
QEMU base image directory |
VM_DEFAULT_MEMORY_MB |
4096 |
Default VM RAM |
VM_ACS_OVERRIDE |
False |
Enable ACS override for IOMMU groups |
Environment variables
| Variable | Description |
|---|---|
KOHAKURIVER_HOST |
Host server address (for CLI) |
KOHAKURIVER_PORT |
Host server port (for CLI) |
KOHAKURIVER_SSH_PROXY_PORT |
SSH proxy port (for CLI) |
What KohakuRiver Is (and Isn't)
| KohakuRiver is for... | KohakuRiver is not for... |
|---|---|
| Small clusters (3-20 nodes) | Large-scale HPC (Slurm, PBS) |
| Command tasks and interactive VPS sessions | Multi-service orchestration (Kubernetes) |
| Docker containers as portable environments | Complex CI/CD pipelines |
| Independent tasks and batch submission | DAG workflow orchestration (Airflow, Prefect) |
| Research labs, home labs, small teams | Multi-tenant production environments |
| GPU workloads with simple allocation | Advanced GPU scheduling (MIG, time-slicing) |
Project Structure
src/
├── kohakuriver/ # Python backend
│ ├── host/ # Host server (FastAPI :8000)
│ ├── runner/ # Runner agent (FastAPI :8001)
│ ├── cli/ # CLI (Typer + Rich + Textual)
│ ├── db/ # Peewee ORM models
│ ├── docker/ # Docker client wrapper
│ ├── qemu/ # QEMU/KVM + VFIO + cloud-init
│ ├── models/ # Pydantic request/response models
│ └── utils/ # Shared utilities, config
├── kohakuriver-manager/ # Vue.js web dashboard
├── kohakuriver-tunnel/ # Rust tunnel client
└── kohakuriver-doc/ # Documentation site
Tech Stack
| Layer | Technologies |
|---|---|
| Backend | Python 3.10+, FastAPI, Uvicorn, Peewee ORM, SQLite, pyroute2 |
| CLI | Typer, Rich, Textual |
| Frontend | Vue.js 3, Vite, Element Plus, Pinia, xterm.js, Plotly.js |
| Tunnel | Rust, Tokio, Tungstenite |
| VM | QEMU/KVM, VFIO, cloud-init, virtio-9p |
| Auth | bcrypt, session cookies, API tokens |
Documentation
Full documentation: riverdoc.kohaku-lab.org
- User Guide — Installation, setup, tasks, VPS, CLI reference, administration
- Developer Guide — Architecture internals, code structure, conventions
- Technical Report — Deep-dive analyses of overlay networking, QEMU virtualization, tunnel protocol, authentication
License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
For commercial or proprietary license exemptions, contact: kohaku@kblueleaf.net








