28 results for “topic:speech-dataset”
Feature extraction of speech signal is the initial stage of any speech recognition system.
ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
A python library to generate speech dataset from Youtube videos
EmoTa is an open-access Tamil Speech Emotion Recognition dataset with 936 utterances from 22 native speakers, covering five emotions (anger, happiness, sadness, fear, and neutrality). It supports emotion classification tasks and advances Tamil language processing.
No description provided.
[T-IFS] RNN-SM: Fast Steganalysis of VoIP Streams Using Recurrent Neural Network
A transcribed speech dataset in Wolof, Pulaar and Sereer, to support agriculture. Funded by Lacuna Fund.
Construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wakeword detection).
Deepfake cross-lingual evaluation dataset (DECRO) is constructed to evaluate the influence of language differences on deepfake detection.
Voice activity detection and speaker gender segmentation audiovisual corpus
Download speech datasets (English and non-English) for Automatic Speech Recognition
A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject
Numpy-librosa implementation of Speech dataset pipeline
A robust forced alignment tool for low-resource languages using multiple ASR models and CER-based matching. Built for noisy data and imperfect transcripts.
🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi language. Building AI models for 12M+ Kirundi speakers through community collaboration. Includes ASR, TTS, and MT capabilities.
Persian spoken digit recognition
A simple CNN-LSTM deep neural model using Tensorflow to classify emotions from a speech dataset
A full-stack webapp for collecting and managing speech datasets.
No description provided.
A dataset of informal Persian audio and text chunks, along with a fully open processing pipeline, suitable for ASR and TTS tasks. Created from crawled content on virgool.io.
top dataset for voice conversion models
Corpus, dataset of speech recording in 50 languages
No description provided.
🧠️🖥️2️⃣️0️⃣️0️⃣️1️⃣️🎼️🎷️ The audio:speeches category for AI2001, containing speech datasets
Simple script that creates a speech dataset quickly
393-Hours-Korean-Children-Speech-Data-by-Mobile-Phone
2-People-New-Zealand-English-Average-Tone-Speech-Synthesis-Corpus
This interactive Python tool enables the recording of bilingual audio samples using PyAudio and ipywidgets. Designed for data collection tasks such as speech datasets, it provides a user-friendly interface to record, save, label, and manage audio files directly within a Jupyter Notebook.