yuekaizhang/west
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
WEST
We Speech Tookit, LLM based Speech Toolkit for Speech Understanding,
Generation, and Interaction.
Highlights
-
Fully LLM-based: Standing on the shoulders of giants by reusing mature
architectures, ecosystems (e.g., Hugging Face), and methods (e.g.,
sequence packing) from large models. -
Full-stack: Supports tasks such as recognition, synthesis, understanding,
dialogue, and multimodal capabilities, with extensibility to incorporate
open-source models. -
Simple and Stupid: A simple and stupid speech toolkit that
everyone can Touch.
Install
conda create -n west python=3.10
conda activate west
pip install -r requirements.txtSupported Tasks and Models
| Task | Model | Recipe |
|---|---|---|
| Speech Recognition | TouchASU(Built-in) | aishell |
| Speech Synthesis | TouchTTS(Built-in) | libritts |
| Speech QA | TouchASU(Built-in) | belle_1.4M_qa |
| Speech Interaction | TouchChat(Built-in) | |
| MutliModal Interaction | TouchOmni(Built-in) |
Citation
Our paper is available on arXiv, and you can cite it as:
@misc{zhang2025westllmbasedspeech,
title={WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction},
author={Binbin Zhang and Chengdong Liang and Shuai Wang and Xuelong Geng and Zhao Guo and Haoyu Li and Hao Yin and Xipeng Yang and Pengshen Zhang and Changwei Ma and Lei Xie},
year={2025},
eprint={2509.19902},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.19902},
}
Discussion & Communication
We created a WeChat group for better discussion and quicker response.
Please scan the personal QR code on the left, who is responsible for inviting you to the chat group.
You can also scan the QR code on the right to follow our official account of WeNet Community.
![]() |
![]() |
|---|

