shyhirt/AutoDub
Automatic video translator and dubber using Whisper, XTTS v2 for voice cloning, and Ollama for local LLM translation. Supports 100+ languages.
AutoDub - Automatic Video Translation & Dubbing
Automatically transcribe, translate, and dub videos into different languages using AI-powered text-to-speech.
Features
- 🎙️ Speech Recognition: Transcribe audio using OpenAI Whisper
- 🌍 Translation: Translate to 100+ languages via Google Translate
- 🗣️ Three TTS Engines:
- Edge TTS: High-quality Microsoft voices (recommended)
- Silero: Fast Russian TTS (offline after first download)
- XTTS: Voice cloning from 6-10 second samples
- 🎬 Video Preservation: Keeps original video, mixes original audio (20%) with dubbed audio (150%)
- 📝 Subtitle Generation: Creates SRT files for translated text
Requirements
System Dependencies
# Fedora/RHEL
sudo dnf install ffmpeg python3.10 python3.10-devel
# Ubuntu/Debian
sudo apt install ffmpeg python3.10 python3.10-devel
# macOS
brew install ffmpegPython Dependencies
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install openai-whisper pysrt edge-tts deep-translator soundfile tqdm
pip install TTS # Only needed for XTTS voice cloningLocal Translation with Ollama (Optional)
AutoDub now supports fully offline translation using Ollama. This is ideal for privacy, avoiding API limits, and achieving more context-aware translations.
1. Install Ollama
For Linux (Fedora/Ubuntu/etc.):
curl -fsSL [https://ollama.com/install.sh](https://ollama.com/install.sh) | sh
ollama pull llama3Quick Start
Here is the concise guide on how to get started using your setup.sh script, formatted in Markdown:
🚀 Quick Start Guide
Follow these three steps to set up and start dubbing your videos:
- Prepare Files
Ensure you have the following files in your project directory:
setup.sh (The installer)
autodub_v4_1.py (The main engine)
install.txt (List of dependencies)
- Run Installation
Open your terminal in the project folder and execute:
chmod +x setup.sh && ./setup.shBasic Usage (Edge TTS - Recommended)
# Dub to Russian (default)
python autodub.py video.mp4
# Dub to English
python autodub.py video.mp4 --target_lang en
# Dub to German
python autodub.py video.mp4 --target_lang deSilero TTS (Faster, Russian only)
# Default voice (aidar)
python autodub.py video.mp4 --tts silero
# Female voice
python autodub.py video.mp4 --tts silero --silero_voice xenia
# Available voices: aidar, baya, kseniya, xenia, eugeneXTTS Voice Cloning (Most Natural)
# Requires 6-10 second clean voice sample
python autodub.py video.mp4 --tts xtts --ref_voice my_voice.wav --target_lang enOllama translator
# Use Ollama with default llama3 model
./run.sh video.mp4 --translator ollama
# Use a specific model (e.g., Mistral)
./run.sh video.mp4 --translator ollama --ollama_model mistral Command-Line Options
positional arguments:
video Input video file
options:
-h, --help Show help message
--tts {edge,silero,xtts}
TTS engine (default: edge)
--target_lang LANG Target language code (default: ru)
Supports: ru, en, de, fr, es, it, pt, ja, zh, etc.
--silero_voice {aidar,baya,kseniya,xenia,eugene}
Silero voice for Russian (default: aidar)
--ref_voice FILE Reference WAV for XTTS voice cloning
--keep-temp Keep temporary files after processing
Supported Languages
Edge TTS supports 100+ languages. Common codes:
ru- Russianen- Englishde- Germanfr- Frenches- Spanishit- Italianpt- Portugueseja- Japanesezh- Chinese
Full list: https://speech.microsoft.com/portal/voicegallery
Output
The script generates:
{video}_dubbed.mp4- Video with dubbed audio{video}_{lang}.srt- Subtitle file with translations
Performance
| Engine | Speed | Quality | Languages | Notes |
|---|---|---|---|---|
| Edge TTS | Fast | ⭐⭐⭐⭐⭐ | 100+ | Best quality, requires internet |
| Silero | Very Fast | ⭐⭐⭐ | Russian only | Offline, robotic |
| XTTS | Slow | ⭐⭐⭐⭐⭐ | 16 | Voice cloning, GPU recommended |
Troubleshooting
"No module named 'soundfile'"
pip install soundfile"TorchCodec is required"
This is already patched in the code. If you still see it, update PyTorch:
pip install --upgrade torch torchaudioSilero model download fails
The script will auto-download on first run (~40MB). Check your internet connection.
XTTS out of memory
Use CPU mode or reduce video length. For long videos, split into segments.
Poor voice quality with Silero
Use Edge TTS or XTTS instead. Silero is designed for speed, not quality.
Ollama Integration Features
- Privacy: Your transcripts and translations never leave your local machine.
- Custom Context: LLMs can handle nuances, slang, and technical terms better than basic translators.
- Cost: 100% free with no character limits or subscription fees.
- Offline Workflow: Combined with Silero or XTTS, you can dub videos without an active internet connection.
| Feature | Google Translate | Ollama (Local LLM) |
|---|---|---|
| Speed | Instant | Depends on your GPU/RAM |
| Setup | Zero setup | Requires model download |
| Internet | Required | Not required |
| Quality | Literal / Standard | Contextual / Natural |
Technical Details
Processing Pipeline
- Extract Audio: FFmpeg extracts mono 16kHz WAV
- Transcribe: Whisper "base" model transcribes with timestamps
- Translate: Google Translate API translates segments
- Synthesize: TTS engine generates speech for each subtitle
- Merge: FFmpeg mixes original (20%) + dubbed (150%) audio with video
Audio Mixing
- Original audio: 20% volume (background)
- Dubbed audio: 150% volume (foreground)
- Output: AAC 128kbps, video copied without re-encoding
License
MIT License - see LICENSE file
Credits
- OpenAI Whisper - Speech recognition
- Edge-TTS - Microsoft TTS
- Silero Models - Russian TTS
- Coqui TTS - XTTS voice cloning
Contributing
Issues and pull requests welcome!