"topic:audio-language-model" — Search

11 results for “topic:audio-language-model”

We introduce the Audio Logical Reasoning (ALR) dataset, consisting of 6,446 text-audio annotated samples specifically designed for complex reasoning tasks. Building on this resource, we propose SoundMind, a rule-based reinforcement learning (RL) algorithm tailored to endow audio language models (ALMs) with deep bimodal reasoning abilities.

Python1.1k131Updated 1 week ago

audio-language-modelaudio-reasoningdatasetreinforcement-learning

FunAudioLLM/Fun-ASR

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python91880Updated 1 hour ago

audioaudio-language-modelaudio-understandingfun-asrmultimodal-large-language-modelspytorchspeaker-diarizationspeech-recognition

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python63751Updated 2 days ago

audio-language-modeldeep-learninglarge-language-modelsmultimodal-large-language-modelsvision-language-model

FlashLabs-AI-Corp/FlashLabs-Chroma

Worlds first open-source real-time end-to-end spoken dialogue model with personalized voice cloning.

Jupyter Notebook54057Updated 1 day ago

audio-language-modelend-to-endreal-time-audiospeech-to-speechstreaming-audiovoice-aivoice-cloningzero-shot-voice

JIA-Lab-research/MGM-Omni

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Python26216Updated 1 week ago

audio-language-modelmulti-modal-large-language-modelmulti-modalitymultimodaltext-to-speech

kehanlu/DeSTA2.5-Audio

Code for DeSTA2.5-Audio, general-purpose LALM

Python1287Updated 1 month ago

audio-language-modellarge-language-modelspeech-language-model

ALucek/multimodal-llm-breakdown

Outlining and demonstrating how language models are able to understand image, video, and text content.

Jupyter Notebook133Updated 2 months ago

audio-language-modelmultimodalmultimodal-large-language-modelsvideo-language-modelvision-language-model

asif-hanif/trojanwave

[EMNLP 2025] Official code repository of paper titled "TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models" accepted in EMNLP 2025 conference.

Python70Updated 1 day ago

audio-classificationaudio-language-modelbackdoor-attacksprompt-learning

lca0503/AudioInterference

Source code of our paper "When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models", ICASSP 2026

Python50Updated 1 month ago

audio-language-modelrobustness

zhudotexe/kani-multimodal-core

Core shared libraries for multimodal Kani extensions.

Python20Updated 3 months ago

audio-language-modelkanimultimodal-large-language-modelsvision-language-model

sanamid/Fun-ASR

No description provided.

Python00Updated 1 hour ago

asyncioaudioaudio-language-modelaudio-visual-speech-recognitionfunasr-clientmultimodal-large-language-modelsparaformerpretrained-modelspunctuationpythonspeaker-diarizationspeech-recognitionspeechllmvoice-activity-detectionwebsocketwhisper