392 results for “topic:automatic-speech-recognition”
Production First and Production Ready End-to-End Speech Recognition Toolkit
OpenAI Whisper ASR Webservice API
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Open STT
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Streaming transcriber with whisper
On-device streaming speech-to-text engine powered by deep learning
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
End-to-end ASR/LM implementation with PyTorch
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目,CPU上的实时率(RTF)小于0.1
On-device speech-to-text engine powered by deep learning
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
The dataset of Speech Recognition
🔉 Youtube Videos Transcription with OpenAI's Whisper
[LREC-COLING 2024 (Oral), Interspeech 2024 (Oral), NAACL 2025, ACL 2025, EMNLP 2025] A Series of Multilingual Multitask Medical Speech Processing
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Wav2Vec for speech recognition, classification, and audio classification
speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names.
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Ultra fast and portable Parakeet implementation for on-device inference in C++ using Axiom with MPS+Unified Memory