Xiong Wang
wangxiongts
Speech/LLM Algorithm Engineer@Alibaba Qwen Team
Languages
Repos
6
Stars
20
Forks
14
Top Language
Python
Loading contributions...
Top Repositories
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
A high-throughput and memory-efficient inference and serving engine for LLMs
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Repositories
6Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
No description provided.
A high-throughput and memory-efficient inference and serving engine for LLMs
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM