hackerpeter1/SVQTD

Data Request instructions are in the project page here.

Dataset preparation

download youtube videos with a python script and convert to audios using ffmpeg
performing music source separation based on spleeter
energy-based segmentation, reference code can be found in ./split.py
extracting feature set using OPENSMILE (optional, only if you are interested in training with traditional feature set)

Training files

Some pooling method for recognition neural network can be found in ./modules.
Some models are in ./models.
Some config files for respectively training Transformer and ResNet are in ./config.
./E2E.py can be used to train neural networks based on config files.
./RPSVM.py can be used to extract embeddings and train a SVM classifier using them.
./FSSVM.py can be used to train a SVM classifier using features from ComParE feature set.

Since our code is not user-friendly, if you have any questions about dataset downloading or the training code, please feel free to contact me through yanze.xu@outlook.com. Also welcome to talk with me if you are interested in timbre phenoemena.

hackerpeter1/SVQTD

Dataset preparation

Training files

On this page

Contributors