19 results for “topic:speech-generation”
Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!
CEP is a software platform designed for users that want to learn or rapidly prototype using standard A.I. components.
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
Self-supervised Generative LM-based Voice Conversion
Swift library for offline text-to-speech synthesis on iOS/macOS. Generate natural speech directly on device using CoreML-optimized FastPitch and HiFiGAN models. No internet required, fully private.
Zero-shot voice cloning text-to-speech (TTS) with explicit emotion class conditioning built on F5-TTS
Unified toolkit for testing and comparing multiple state-of-the-art open-source Text-to-Speech (TTS) models (with voice cloning, multilingual support, and audio samples).
A conversational speech model (CSM) that generates natural-sounding speech with context awareness and consistent audio quality. Supports multi-speaker conversations and maintains contextual understanding across turns, ensuring consistent audio output throughout the conversation.
Streamlit frontend for Coqui-tts
Text to Speech generator. Supports multiple accents.
A unified interface to extract hidden representations from various speech foundation models
A simple Discord bot that synthesizes speech directly to a voice channel via text commands with support for sound effects.
PyOrator: A Python-based Speech Generator
JVM library for speech generation written in Kotlin and based on the C++ libraries bark.cpp and piper
Lets your AI alter-ego do the talking while your GPU quietly files for therapy.
An autonomous AI agent for real-time information retrieval and speech generation, leveraging LLMs, RAG, and multi-agent collaboration.
A web-based tool for comparing audio model outputs side-by-side with content diff and quality evaluation.