"topic:mfcc" — Search

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Python25764Updated 4 days ago

classifierfeature-extractiongaussian-mixture-modelsisomapkernel-pcalinear-discriminant-analysislinear-prediction-coefficientslocally-linear-embeddinglong-short-term-memorymfccnatural-language-processingnlpnltkprincipal-component-analysisspectral-clusteringspectral-embeddingspeech-processingspeech-utterancesupport-vector-machines

sp-nitech/SPTK

A suite of speech signal processing tools

C++24328Updated 1 month ago

audio-processingcepstrumcppdsplpclspmfccsignal-processingspeechspeech-processingsptkunix-command

ewan-xu/LibrosaCpp

LibrosaCpp is a c++ implemention of librosa to compute short-time fourier transform coefficients,mel spectrogram or mfcc

C++23254Updated 13 hours ago

eigenlibrosamfcc

jsingh811/pyAudioProcessing

Audio feature extraction and classification

Python22741Updated 5 months ago

audio-dataaudio-fileschroma-featuresclassifierclassifier-optionsclassifyclassify-audioclassify-audio-samplesfeature-extractiongfccgfcc-extractorgfcc-featureshyperparameter-tuningmfccmfcc-extractormfcc-featurespyaudioprocessingspectral-featureswav-files

SuperKogito/Voice-based-gender-recognition

:sound: :boy: :girl:Voice based gender recognition using Mel-frequency cepstrum coefficients (MFCC) and Gaussian mixture models (GMM)

Python22167Updated 1 week ago

data-sciencegaussian-mixture-modelsgendergender-classificationgender-detectiongender-recognitiongender-recognition-by-voicegmmmachine-learningmel-frequenciesmfccscikit-learnscikit-learn-pythonsignalspeakerspeechvocalvoice

csukuangfj/kaldifeat

Kaldi-compatible online & offline feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd - Provide C++ & Python API

C++21437Updated 1 day ago

cppfbankfeatures-extractionkaldimfcconline-feature-extractorplppythonpytorchstreaming-feature-extractor

sp-nitech/diffsptk

A differentiable version of SPTK

Python19519Updated 1 day ago

cepstrumcqtddspdeep-learningdigital-signal-processingdspgmmk-meanslpclspmdctmfccnmfplppqmfpythonpytorchsignal-processingsptkstft

SuyashMore/MevonAI-Speech-Emotion-Recognition

Identify the emotion of multiple speakers in an Audio Segment

C17946Updated 4 weeks ago

artificial-intelligencecolab-notebookconvolutional-neural-networksdeep-learningdiarizationemotion-analysisemotion-recognitionkeras-tensorflowmachine-learningmfccmfcc-analysisspeech-processinguis-rnn

tympanix/subsync

Synchronize your subtitles using machine learning

Python15915Updated 1 month ago

delayfixmachine-learningmfccneural-networkshiftshift-subtitlespeech-detectionsubsyncsubtitlesubtitles

amanbasu/speech-emotion-recognition

Detecting emotions using MFCC features of human speech using Deep Learning

Jupyter Notebook13338Updated 1 month ago

deep-learningemotionemotion-recognitionmfccrnnspeech-recognitiontensorflow

ZhuoZhuoCrayon/AcousticKeyBoard-Web

❓声学键盘｜脑洞大开：做一个能听懂键盘敲击键位的「玩具」，学习信号处理 / 深度学习 / 安卓 / Django。

Python886Updated 4 months ago

deep-learningdjangolstmmfcctensorflow

GauravWaghmare/Speaker-Identification

A program for automatic speaker identification using deep learning techniques.

Python8426Updated 2 months ago

kerasmfccspeaker-recognitionspeaker-verification

MycroftAI/sonopy

A simple audio feature extraction library

Python8121Updated 4 months ago

audio-processinglibrarymel-spectrogrammfccsoundspectrogram

mathquis/node-personal-wakeword

Personal wake word detector

JavaScript699Updated 1 month ago

dtwhotword-detectionhotword-detectormfccnodewakeword

ZitengWang/python_kaldi_features

python codes to extract MFCC and FBANK speech features for Kaldi

Python6718Updated 5 months ago

kaldimfcc

stefantaubert/mel-cepstral-distance

A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment".

Python6512Updated 4 days ago

cepstraldistancedistortiondivergencedtwdynamic-time-warpinglanguagelinguisticsmcdmelmfccobjective-evaluationspectrogramspectrumspeech-qualityspeech-synthesistext-to-speechttsvoice-cloning

K-

k-farruh/speech-accent-detection

The human speaks a language with an accent. A particular accent necessarily reflects a person's linguistic background. The model defines accent based audio record. The result of the model could be used to determine accents and help decrease accents to English learning students and improve accents by training.

Python6513Updated 3 days ago

accentaccent-detectionenglish-languagesmfccnative-speakers

georgid/AlignmentDuration

Lyrics-to-audio-alignement system. Based on Machine Learning Algorithms: Hidden Markov Models with Viterbi forced alignment. The alignment is explicitly aware of durations of musical notes. The phonetic model are classified with MLP Deep Neural Network.

Python596Updated 1 month ago

alignmentdecodingdeep-learningdurationgmmhidden-markov-modelhtklyricsmfccmusicmusic-information-retrievalneural-networkspythonresearchsignal-processingsynchronizationupf

zafarrafii/Zaf-Python

Zafar's Audio Functions in Python for audio signal analysis: STFT, inverse STFT, mel filterbank, mel spectrogram, MFCC, CQT kernel, CQT spectrogram, CQT chromagram, DCT, DST, MDCT, inverse MDCT.

Jupyter Notebook5811Updated 3 weeks ago

audio-signal-processingchromagramconstant-q-transformcqt-kernelcqt-spectrogramdctdiscrete-cosine-transformdiscrete-sine-transformdstinverse-mdctinverse-stftmdctmel-filterbankmel-frequency-cepstral-coefficientsmel-spectrogrammfccmodified-discrete-cosine-transformpythonshort-time-fourier-transformstft

alicex2020/Deep-Learning-Lie-Detection

Use machine learning models to detect lies based solely on acoustic speech information

Jupyter Notebook5611Updated 7 hours ago

acoustic-featuresdeep-learningensemble-classifierensemble-learningensemble-machine-learningensemble-modellie-detectormachine-learningmfccmfcc-analysispitch-trackingsupport-vector-machines

SuperKogito/Voice-based-speaker-identification

:sound: :boy: :girl: :woman: :man: Speaker identification using voice MFCCs and GMM

Python5415Updated 11 months ago

gaussian-mixture-modelsgmmmachine-learningmel-frequenciesmel-frequency-cepstral-coefficientsmfccscikit-learnscikit-learn-pythonsignalspeaker-identificationspeaker-recognitionspeechvocalvoice

supikiti/PNCC

A implementation of Power Normalized Cepstral Coefficients: PNCC

Python549Updated 3 months ago

deep-learningmfccpnccrobustnessspeech-enhancementspeech-processingspeech-recognition

aubio/vamp-aubio-plugins

aubio plugins for Vamp

C++5010Updated 1 month ago

analysisaubioaudiobeatbeat-detectionbeat-trackingmfccmusicmusic-information-retrievalonsetonset-detectiontempotempo-detectiontempo-trackingvamp-plugins

Page 1 of 11