Qwen 3.5 35B-A3B Uncensored Playground

A Jupyter notebook that runs Qwen3.5-35B-A3B-Uncensored (Q6_K, ~28 GB) locally via llama.cpp and serves a Gradio chat UI with streaming and adjustable parameters.

Requirements

GPU: NVIDIA with CUDA 12.8+ and at least 32 GB VRAM (tested on A100 80GB)
Python: 3.12
Platform: Linux x86_64

The fastest way to get running — no local GPU needed:

Create a free account at modal.com
Go to Notebooks and upload qwen_uncensored.ipynb
In the sidebar, set the kernel GPU to A100 80GB (or any GPU with 32+ GB VRAM)
Run all cells — the model downloads, loads onto the GPU, and Gradio gives you a public chat URL

The whole setup takes about 5 minutes. You only pay for the seconds the kernel is running.

Quick start (local)

Open qwen_uncensored.ipynb and run all cells in order:

Cell	What it does
0	Installs JamePeng's llama-cpp-python fork (pre-built CUDA 12.8 wheel), Gradio, and NVIDIA runtime libs
1	Downloads the Q6_K GGUF from HuggingFace (cached after first run)
2	Loads the model onto the GPU
3	Launches the Gradio chat interface

The Gradio UI provides a shareable public URL via share=True.

Default parameters

Defaults follow the official Qwen3.5 recommended settings for thinking mode (general tasks):

Parameter	Default
Temperature	1.0
Top-p	0.95
Top-k	20
Max tokens	8192
Repeat penalty	1.0

All parameters are adjustable via sliders in the sidebar.

Model

Base: Qwen/Qwen3.5-35B-A3B — 35B total params, ~3B active per forward pass (MoE)
Uncensored by: HauhauCS (Aggressive variant, 0/465 refusals)
Quantization: Q6_K (6.58 BPW, imatrix)

License

MIT

lf1up/qwen-uncensored-playground

Qwen 3.5 35B-A3B Uncensored Playground

Requirements

Quick start (local)

Default parameters

Model

License

On this page

Contributors

lf1up/qwen-uncensored-playground

Qwen 3.5 35B-A3B Uncensored Playground

Requirements

Quick start with Modal (recommended)

Quick start (local)

Default parameters

Model

License

On this page

Contributors