11 results for “topic:audio-language-model”
We introduce the Audio Logical Reasoning (ALR) dataset, consisting of 6,446 text-audio annotated samples specifically designed for complex reasoning tasks. Building on this resource, we propose SoundMind, a rule-based reinforcement learning (RL) algorithm tailored to endow audio language models (ALMs) with deep bimodal reasoning abilities.
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
Worlds first open-source real-time end-to-end spoken dialogue model with personalized voice cloning.
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Code for DeSTA2.5-Audio, general-purpose LALM
Outlining and demonstrating how language models are able to understand image, video, and text content.
[EMNLP 2025] Official code repository of paper titled "TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models" accepted in EMNLP 2025 conference.
Source code of our paper "When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models", ICASSP 2026
Core shared libraries for multimodal Kani extensions.
No description provided.