84 results for “topic:audio-visual”
A curated list of different papers and datasets in various areas of audio-visual processing
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
[NeurIPS 2025] OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
An audio visualizer for React. Provides separate components to visualize both live audio and audio blobs.
Patchies is a creative coding patcher for audio, visual and computational things that runs on the web. Connect tools you know and try new ones ✨
This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
🎙 Generator waveform paths for SVG 🎶
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
Implementation of "EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition, ICCV, 2019" in PyTorch
Human Emotion Understanding using multimodal dataset.
Libvisual Audio Visualization
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
Programmatic minimalistic audio visualizations.
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23
Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].
[🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound Separation from Diverse Categories"
[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)
No description provided.
Transformer-based online speech recognition system with TensorFlow 2
Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
Audio-visual diarization pipeline used for creating VoxConverse dataset
Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model" (AVLIT)
Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
Accepted by TMM 2022
[2026 AAAI] Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition