WEST

We Speech Tookit, LLM based Speech Toolkit for Speech Understanding,
Generation, and Interaction.

Highlights

Fully LLM-based: Standing on the shoulders of giants by reusing mature
architectures, ecosystems (e.g., Hugging Face), and methods (e.g.,
sequence packing) from large models.
Full-stack: Supports tasks such as recognition, synthesis, understanding,
dialogue, and multimodal capabilities, with extensibility to incorporate
open-source models.
Simple and Stupid: A simple and stupid speech toolkit that
everyone can Touch.

Install

conda create -n west python=3.10
conda activate west
pip install -r requirements.txt

Supported Tasks and Models

Task	Model	Recipe
Speech Recognition	TouchASU(Built-in)	aishell
Speech Synthesis	TouchTTS(Built-in)	libritts
Speech QA	TouchASU(Built-in)	belle_1.4M_qa
Speech Interaction	TouchChat(Built-in)
MutliModal Interaction	TouchOmni(Built-in)

Citation

Our paper is available on arXiv, and you can cite it as:

@misc{zhang2025westllmbasedspeech,
      title={WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction},
      author={Binbin Zhang and Chengdong Liang and Shuai Wang and Xuelong Geng and Zhao Guo and Haoyu Li and Hao Yin and Xipeng Yang and Pengshen Zhang and Changwei Ma and Lei Xie},
      year={2025},
      eprint={2509.19902},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.19902},
}

Discussion & Communication

We created a WeChat group for better discussion and quicker response.
Please scan the personal QR code on the left, who is responsible for inviting you to the chat group.
You can also scan the QR code on the right to follow our official account of WeNet Community.

yuekaizhang/west

WEST

Highlights

Install

Supported Tasks and Models

Citation

Discussion & Communication

On this page

Languages

Contributors