ER

erogol/FFTNet

FFTNet vocoder implementation

deep-learning fftnet pytorch text2speech vocoder

Unofficial Implementation of FFTNet vocode paper.

implement the model.
implement tests.
overfit on a single batch (sanity check).
linearize weights for eval time.
measure the run-time on GPU and CPU. (1 sec audio takes ~47 secs) If anyone knows additional tricks from the paper, let me know. So far I asked the authors but nobody returned.
train on LJSpeech spectrograms.
distill model as in Parallel WaveNet paper.

On this page

Languages

Jupyter Notebook87.4%Python12.5%Shell0.0%

Contributors

Mozilla Public License 2.0

Created June 25, 2018

Updated January 4, 2024

erogol/FFTNet | GitHunt