5 results for “topic:active-speaker-detection”
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)
AnnoTheia is a data annotation toolkit that identifies when a person speaks in a scene and transcribes their speech, also offering flexibility to replace modules for different languages.
Accepted by TMM 2022
SyncNet based on Meta's Perception Encoder Audio-Visual (PE-AV)