763 results for “topic:speech-processing”
A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Reading list for research topics in multimodal machine learning
Foundation Architecture for (M)LLMs
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WaveNet vocoder
AI powered speech denoising and enhancement
Controllable and fast Text-to-Speech for over 7000 languages!
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
General Speech Restoration
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
SincNet is a neural architecture for efficiently processing raw audio samples.
Open source audio annotation tool for humans
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
You can find the speech algorithms you want here
Become a cracked AI/ML Research Engineer
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
A neural network for end-to-end speech denoising
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
语音方向实验室/公司/资源/实习等,欢迎推荐或自荐
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
:sound: spafe: Simplified Python Audio Features Extraction
UniSpeech - Large Scale Self-Supervised Learning for Speech