1,903 results for “topic:speech”
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
SoftVC VITS Singing Voice Conversion
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
kaldi-asr/kaldi is the official location of the Kaldi project.
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
ModelScope: bring the notion of Model-as-a-Service to life.
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
💬 Speech recognition for your site
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Silero Models: pre-trained text-to-speech models made embarrassingly simple
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Low-latency AI engine for mobile devices & wearables
A fast multimodal LLM for real-time voice
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Foundational model for human-like, expressive TTS
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Noise supression using deep filtering
🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.
Code examples for new APIs of iOS 10.
OpenAI Whisper ASR Webservice API
A simple, high-quality voice conversion tool focused on ease of use and performance.
Lingvo
Data manipulation and transformation for audio signal processing, powered by PyTorch