93 results for “topic:torchaudio”
A generative speech model for daily dialogue.
VoxNovel: generate audiobooks giving each character a different voice actor.
:star: 本科毕业设计:基于内容的音乐推荐系统设计与开发。使用了Pytorch框架构建训练模型代码,使用Django构建了前后端。
Pitch-shift audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.
🎙️ Arabic TTS models (Tacotron2, FastPitch)
Cascade is a production-ready, high-performance, and low-latency audio stream processing library designed for Voice Activity Detection (VAD). Built upon the excellent Silero VAD model, Cascade significantly reduces VAD processing latency while maintaining high accuracy through its 1:1:1 binding architecture and asynchronous streaming technology.
DEPRECATED!
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.
Kokoro Language Model Training Script for Russian (Ruslan Corpus)
KAE : KAN-based AutoEncoder (AE, VAE, VQ-VAE, RVQ, etc.)
Wheels for Python 3
Cross-compilation of PyTorch armv7l (32bit) for RaspberryPi OS
Instructions on how to build PyTorch 2.8 on Debian 12 with support for the AMD gfx1010 architecture
Open Translator: Speech To Speech and Speech to text Translator with voice cloning and other cool features
Tutorial to install torch/pytorch with cuda using uv
Sound classification on Urban Sound Dataset
A utility for wrapping the Free Spoken Digit Dataset into PyTorch-ready data set splits.
Speech command classification on Speech-Command v0.02 dataset using PyTorch and torchaudio. In this example, three models have been trained using the raw signal waveforms, MFCC features and MelSpectogram features.
High fidelity music synthesis using diffusion and UnivNet.
🎙️ German TTS (FastPitch) with Thorsten voice / emotional
this is a simple artificial neural network model using deep learning and torch-audio to classify cats and dog sounds.
The Voice Cloner is a Python-based project that leverages Tacotron 2 and WaveGlow models for text-to-speech (TTS) synthesis and basic voice cloning. This project supports 22 official Indian languages, including Sanskrit, making it versatile for multilingual text input.
(😞 😨 😄 😮 😍 😠 😐 🤮) This is a simple DL API that classifies human emotions from audios and text.
Demonstration for the Qwen/Qwen3-TTS-12Hz models using Daggr for modular UI nodes. Supports voice design (prompt-to-speech), voice cloning (zero-shot), and custom voice synthesis with multiple speakers and languages. Features lazy model loading to optimize memory, multi-model sizes (0.6B and 1.7B), ASR and support for various audio inputs.
🤖 Telegram bot powered by Deep Learning. Automatically assesses the safety of audios and voice messages for people suffering from misophonia.
Speech to Text with Wav2Vec2 using torchaudio
Experiments in neural networks for audio generation.
Utilities for preprocessing the Switchboard and WSJ corpora in Python3
Mixer-TTS for efficient TTS
This repo implements a deep learning pipeline for classifying environmental sounds from the ESC-50 dataset.