GitHunt — Discover GitHub Repositories

Cascade is a production-ready, high-performance, and low-latency audio stream processing library designed for Voice Activity Detection (VAD). Built upon the excellent Silero VAD model, Cascade significantly reduces VAD processing latency while maintaining high accuracy through its 1:1:1 binding architecture and asynchronous streaming technology.

Python8410Updated 1 month ago

async-awaitaudiohigh-performancenumpyonnxruntime+4

evshiron/rocm_lab

DEPRECATED!

Shell505Updated 4 months ago

gfx1100rocmtensorflowtorchtorchaudio+1

KentoNishi/torch-time-stretch

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Python403Updated 8 months ago

audio-augmentationaugmentationgpu-supportpytorchsound-processing+3

igorshmukler/kokoro-ruslan

Kokoro Language Model Training Script for Russian (Ruslan Corpus)

Python3911Updated 9 hours ago

mlpytorchtorchaudiotts

SekiroRong/KAN-AutoEncoder

KAE : KAN-based AutoEncoder (AE, VAE, VQ-VAE, RVQ, etc.)

Jupyter Notebook382Updated 3 months ago

aeaudioaudio-processingautoencoderdeep-learning+7

torchsmoke/Python3-Wheels

Wheels for Python 3

273Updated 3 years ago

python3pytorchtorchaudiotorchvisionwheel

PINTO0309/pytorch4raspberrypi

Cross-compilation of PyTorch armv7l (32bit) for RaspberryPi OS

Dockerfile212Updated 2 months ago

armv7lpytorchraspberry-pitorchaudiotorchvision

Efenstor/PyTorch-ROCm-gfx1010

Instructions on how to build PyTorch 2.8 on Debian 12 with support for the AMD gfx1010 architecture

Shell172Updated 6 days ago

5600xt5700xtamdgpubuilding-instructionschainner+7

overcrash66/OpenTranslator

Open Translator: Speech To Speech and Speech to text Translator with voice cloning and other cool features

Python144Updated 1 week ago

audio-translationautosubcoqui-ttsgtts-apillama2+12

BaoNguyen6742/uv-install-torch

Tutorial to install torch/pytorch with cuda using uv

Shell131Updated 4 days ago

cudainstallinstallationpackagepython+7

BakingBrains/Sound_Classification

Sound classification on Urban Sound Dataset

Jupyter Notebook114Updated 2 months ago

pytorchsound-classificationsound-classification-spectrogramstorchaudiourban-sound-classification

eonu/torch-fsdd

A utility for wrapping the Free Spoken Digit Dataset into PyTorch-ready data set splits.

Python93Updated 7 months ago

audioaudio-datasetdata-loaderfree-spoken-digit-datasetfsdd+5

aminul-huq/Speech-Command-Classification

Speech command classification on Speech-Command v0.02 dataset using PyTorch and torchaudio. In this example, three models have been trained using the raw signal waveforms, MFCC features and MelSpectogram features.

Python95Updated 1 year ago

classificationdnnpytorch-tutorialspeechspeech-recognition+1

LukeSutor/programmatic-pitch

High fidelity music synthesis using diffusion and UnivNet.

Python92Updated 1 year ago

diffusiongangenerative-modelpytorchtorchaudio

nipponjo/tts-german-pytorch

🎙️ German TTS (FastPitch) with Thorsten voice / emotional

Python90Updated 3 months ago

deep-learningemotional-speechfastpitchgermangerman-language+8

CrispenGari/animal-sound-classification

this is a simple artificial neural network model using deep learning and torch-audio to classify cats and dog sounds.

Jupyter Notebook81Updated 7 months ago

artificial-intelligenceartificial-neural-networksaudioaudio-processingdeep-learning+6

thekartikeyamishra/VoiceCloner

The Voice Cloner is a Python-based project that leverages Tacotron 2 and WaveGlow models for text-to-speech (TTS) synthesis and basic voice cloning. This project supports 22 official Indian languages, including Sanskrit, making it versatile for multilingual text input.

Python83Updated 2 weeks ago

aiindic-transliterationlibrosamachine-learningnumpy+6

CrispenGari/emotionAI

(😞 😨 😄 😮 😍 😠 😐 🤮) This is a simple DL API that classifies human emotions from audios and text.

Jupyter Notebook71Updated 2 years ago

artificial-intelligencedeeplearningflaskmachine-learningpython+4

PRITHIVSAKTHIUR/Qwen3-TTS-Daggr-UI

Demonstration for the Qwen/Qwen3-TTS-12Hz models using Daggr for modular UI nodes. Supports voice design (prompt-to-speech), voice cloning (zero-shot), and custom voice synthesis with multiple speakers and languages. Features lazy model loading to optimize memory, multi-model sizes (0.6B and 1.7B), ASR and support for various audio inputs.

Python70Updated 2 weeks ago

asraudio-processingdaggrgradiohuggingface-transformers+13

glefundes/misophonia-bot

🤖 Telegram bot powered by Deep Learning. Automatically assesses the safety of audios and voice messages for people suffering from misophonia.

Python60Updated 3 years ago

audioaudio-classificationdeep-learningpytorchtelegram+3

pradeepbatchu/speechtotext

Speech to Text with Wav2Vec2 using torchaudio

Python61Updated 4 months ago

flaskspeech-to-texttorchtorchaudiotorchlight+3

LumenPallidium/audio_generation

Experiments in neural networks for audio generation.