Daniel-McLarty/Python-Autodub
An automated, AI-powered video dubbing pipeline that extracts audio, separates vocals from background noise (using Demucs), diarizes speakers (using Pyannote), and generates translated voice clones (using F5-TTS). It then re-assembles the audio using fast, frame-accurate numpy math and muxes it back into a final MKV video file.
Python Autodub
An automated, AI-powered video dubbing pipeline that extracts audio, separates vocals from background noise (using Demucs), diarizes speakers (using Pyannote), and generates translated voice clones (using F5-TTS). It then re-assembles the audio using fast, frame-accurate numpy math and muxes it back into a final MKV video file.
Copyright (C) Daniel McLarty 2026
Features
- Vocal Separation: Isolates background noise and music from dialogue using 4-stem htdemucs.
- Speaker Diarization: Identifies multiple speakers in the audio using Pyannote. (Automatically bypassed when targeting a single speaker to save massive amounts of VRAM).
- Voice Cloning: Automatically extracts clean samples for each identified speaker and uses F5-TTS to generate high-fidelity, translated lines.
- Hybrid Cloning: Optional setting to blend the original emotional cadence of the source vocals with the base voice clone.
- Smart Audio Assembly: Features a "Shrink-Only Guillotine" that uses phase vocoder time-stretching to strictly confine AI-generated audio to its exact subtitle window, preventing dialogue overlap.
- Latency Auto-Trimming: Automatically scans and trims dead air or low-volume rumbling from the start of generated lines to ensure frame-perfect lip-sync timing.
- Graphical Interface: Easy-to-use Tkinter GUI to configure thresholds, workers, file paths, and output settings (with native Windows 11 Light/Dark theming).
- Smart Resume & Caching: Automatically hashes input files to detect changes, allowing the pipeline to seamlessly resume interrupted jobs and skip heavy processing steps, saving hours of GPU time.
Prerequisites
- NVIDIA GPU: A CUDA-compatible GPU is required.
- Hugging Face Account: You need a Hugging Face token to use the Pyannote diarization models. You must also visit the following pages while logged in to accept their user conditions:
FFmpeg Dependency & Licensing
To handle high-performance audio normalization and video muxing, this project utilizes FFmpeg.
Installation
- Windows: A custom-built, optimized FFmpeg binary is included in the
bin/folder. No additional installation is required. - Linux: Please install FFmpeg via your system's package manager:
- Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg - Fedora:
sudo dnf install ffmpeg - Arch:
sudo pacman -S ffmpeg
- Ubuntu/Debian:
Licensing & LGPL Compliance
This software uses a custom build of FFmpeg licensed under the GNU Lesser General Public License (LGPL) version 2.1.
- No Changes: We have not modified the FFmpeg source code.
- License Text: A copy of the LGPL v2.1 is provided in
bin/FFMPEG_LGPL. - Build Instructions: Details on how this binary was configured and compiled can be found in
bin/build_info.md. - Source Code: You can obtain the official FFmpeg source code at ffmpeg.org.
Installation & Usage
Python Autodub uses uv for lightning-fast, reproducible dependency management. The included launchers will automatically detect your environment, download necessary build tools, and sync the CUDA-enabled libraries before launching the UI.
Windows
- Double-click
Python-Autodub.exe(or runLaunch_UI.ps1if running from source). - The launcher will automatically configure MSVC Build Tools, sync the
uvenvironment, and launch the GUI.
Linux
- Open your terminal in the project root.
- Run
bash install_linux_shortcut.shto automatically configure execution permissions and add the app to your desktop environment's application grid. - Launch "Python Autodub Studio" from your app menu, or run
src/Launch_UI.shdirectly from the terminal.
Folder Structure & Artifacts
To keep the root directory clean, the project organizes files dynamically:
src/: Contains all python scripts, launchers, and base voice templates.temp/: Generated during execution. Holds all intermediate files including separated vocals (htdemucs/), voice samples (base_clones/), individual generated lines (temp_lines/), and intermediate audio mixing tracks.output/: Generated upon completion. Contains the finalFINAL_DUBBED_MOVIE.mkv..uv_cache/: Localized package cache to support cross-drive hard linking and prevent phantom disk usage.