Repos
55
Stars
6.0k
Forks
299
Top Language
Python
Loading contributions...
Top Repositories
基于vits与softvc的歌声音色转换模型
无需情感标注的情感可控语音合成模型,基于VITS
text to speech using autoregressive transformer and VITS
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
VITS with phoneme-level prosody modeling based on MaskGIT
An Implementation of Singing Voice Conversion Based on Diffsinger
Repositories
55基于vits与softvc的歌声音色转换模型
text to speech using autoregressive transformer and VITS
无需情感标注的情感可控语音合成模型,基于VITS
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
An Implementation of Singing Voice Conversion Based on Diffsinger
forge version of majo's broom mod
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
数据集自动化制作脚本
VITS with phoneme-level prosody modeling based on MaskGIT
singing voice conversion based on glow-tts
vits
Majo's Broom Mod
基于FreeVC的歌声转换
基于vits fastspeech2 visinger的tts模型
The open source code for SimpleSpeech series
Official code for "A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"
Generative models for conditional audio generation
No description provided.
Analyzing and Improving the Training Dynamics of Diffusion Models (EDM2)
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Conditional Variational Auto-Encoder with Jointly Training FastSpeech2(+Conformer) and HiFi-GAN for End to End Text to Speech
No description provided.
No description provided.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
mandarin version of MQTTS
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
No description provided.