Whisper Transcription API Service
A FastAPI-based service that transcribes audio files from AWS S3 using OpenAI's Whisper model. The service downloads audio files from S3, transcribes them using Whisper, and uploads the transcription results back to S3.
Features
- ๐ต Transcribe audio files from AWS S3
- ๐ High-performance Whisper model with GPU acceleration (if available)
- โ๏ธ Direct S3 integration for input and output
- ๐ RESTful API with FastAPI
- ๐ฅ๏ธ Cross-platform support (Windows, Linux, macOS)
- ๐ Health check endpoint
- ๐ง Setup verification script
Requirements
- Python 3.8 or higher
- FFmpeg (required for audio processing)
- AWS credentials configured
- Audio files stored in AWS S3
Installation
Prerequisites
Before installing, ensure you have:
- Python 3.8+ installed
- FFmpeg installed (required for audio processing)
- pip package manager
- AWS account with S3 access
- AWS credentials configured
Quick Setup
- Clone or download the project files
- Navigate to the project directory
- Follow the platform-specific instructions below
Platform-Specific Installation
๐ช Windows
Step 1: Install FFmpeg
Option A: Using Chocolatey (Recommended)
# Install Chocolatey if not already installed (run as Administrator)
# Visit https://chocolatey.org/install for full instructions
# Install FFmpeg
choco install ffmpegOption B: Manual Installation
- Download FFmpeg from: https://ffmpeg.org/download.html#build-windows
- Extract the zip file to a folder (e.g.,
C:\ffmpeg) - Add
C:\ffmpeg\binto your system PATH:- Open System Properties โ Advanced โ Environment Variables
- Edit the PATH variable and add the FFmpeg bin directory
- Restart command prompt
Option C: Using Winget
# Install using Windows Package Manager
winget install ffmpegStep 2: Install Python Dependencies
# Install required packages
pip install -r requirements.txtStep 3: Configure AWS Credentials
Choose one of these methods:
Method A: Environment Variables (Recommended)
# Set environment variables (replace with your actual credentials)
set AWS_ACCESS_KEY_ID=your_access_key_here
set AWS_SECRET_ACCESS_KEY=your_secret_key_here
set AWS_DEFAULT_REGION=us-east-1Method B: AWS CLI Configuration
# Install AWS CLI if not already installed
pip install awscli
# Configure AWS credentials
aws configureStep 4: Verify Setup
# Run the setup verification script
python check_setup.pyStep 5: Start the Service
# Option 1: Use the batch script
run.bat
# Option 2: Run directly
python main.py๐ง Linux
Step 1: Install FFmpeg
Ubuntu/Debian:
# Update package list
sudo apt update
# Install FFmpeg
sudo apt install ffmpegCentOS/RHEL/Fedora:
# For CentOS/RHEL (with EPEL repository)
sudo yum install epel-release
sudo yum install ffmpeg
# For Fedora
sudo dnf install ffmpegArch Linux:
# Install FFmpeg
sudo pacman -S ffmpegStep 2: Install Python Dependencies
# Install required packages
pip install -r requirements.txt
# Alternative: Use pip3 if needed
pip3 install -r requirements.txtStep 3: Configure AWS Credentials
Choose one of these methods:
Method A: Environment Variables (Recommended)
# Set environment variables (replace with your actual credentials)
export AWS_ACCESS_KEY_ID=your_access_key_here
export AWS_SECRET_ACCESS_KEY=your_secret_key_here
export AWS_DEFAULT_REGION=us-east-1
# To make permanent, add to ~/.bashrc or ~/.profile
echo 'export AWS_ACCESS_KEY_ID=your_access_key_here' >> ~/.bashrc
echo 'export AWS_SECRET_ACCESS_KEY=your_secret_key_here' >> ~/.bashrc
echo 'export AWS_DEFAULT_REGION=us-east-1' >> ~/.bashrc
source ~/.bashrcMethod B: AWS CLI Configuration
# Install AWS CLI if not already installed
pip install awscli
# Configure AWS credentials
aws configureStep 3: Make Scripts Executable
# Make the run script executable
chmod +x run.shStep 4: Verify Setup
# Run the setup verification script
python check_setup.pyStep 5: Start the Service
# Option 1: Use the shell script
./run.sh
# Option 2: Run directly
python main.py๐ macOS
Step 1: Install FFmpeg
Using Homebrew (Recommended):
# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install FFmpeg
brew install ffmpegUsing MacPorts:
# Install FFmpeg
sudo port install ffmpegStep 2: Install Python Dependencies
# Install required packages
pip install -r requirements.txt
# If using Homebrew Python, you might need:
pip3 install -r requirements.txtStep 3: Configure AWS Credentials
Choose one of these methods:
Method A: Environment Variables (Recommended)
# Set environment variables (replace with your actual credentials)
export AWS_ACCESS_KEY_ID=your_access_key_here
export AWS_SECRET_ACCESS_KEY=your_secret_key_here
export AWS_DEFAULT_REGION=us-east-1
# To make permanent, add to ~/.zshrc (for zsh) or ~/.bash_profile (for bash)
echo 'export AWS_ACCESS_KEY_ID=your_access_key_here' >> ~/.zshrc
echo 'export AWS_SECRET_ACCESS_KEY=your_secret_key_here' >> ~/.zshrc
echo 'export AWS_DEFAULT_REGION=us-east-1' >> ~/.zshrc
source ~/.zshrcMethod B: AWS CLI Configuration
# Install AWS CLI if not already installed
pip install awscli
# Configure AWS credentials
aws configureStep 3: Make Scripts Executable
# Make the run script executable
chmod +x run.shStep 4: Verify Setup
# Run the setup verification script
python check_setup.pyStep 5: Start the Service
# Option 1: Use the shell script
./run.sh
# Option 2: Run directly
python main.pyUsage
Starting the Service
Once installed and configured, start the service:
- Windows:
run.batorpython main.py - Linux/macOS:
./run.shorpython main.py
The API will be available at: http://localhost:8000
API Endpoints
Health Check
GET http://localhost:8000/healthTranscribe Audio File
GET http://localhost:8000/transcribe?bucket=your-s3-bucket&fileKey=path/to/audio/file.mp3Example Usage
# Health check
curl http://localhost:8000/health
# Transcribe an audio file
curl "http://localhost:8000/transcribe?bucket=my-audio-bucket&fileKey=recordings/meeting.mp3"Configuration
Environment Variables
| Variable | Description | Required |
|---|---|---|
AWS_ACCESS_KEY_ID |
AWS access key | Yes |
AWS_SECRET_ACCESS_KEY |
AWS secret key | Yes |
AWS_DEFAULT_REGION |
AWS region (e.g., us-east-1) | Optional |
GPU Support
The service automatically detects and uses GPU acceleration if available:
- NVIDIA GPUs: Requires CUDA-compatible PyTorch installation
- CPU fallback: Automatic fallback to CPU if GPU unavailable
Troubleshooting
Common Issues
-
Python version error
- Ensure Python 3.8+ is installed
- Check with:
python --version
-
FFmpeg not found
- Windows: Install via Chocolatey (
choco install ffmpeg) or download manually - Linux: Install via package manager (
sudo apt install ffmpegor similar) - macOS: Install via Homebrew (
brew install ffmpeg) - Verify installation:
ffmpeg -version
- Windows: Install via Chocolatey (
-
Package installation errors
- Update pip:
pip install --upgrade pip - Try:
pip install --no-cache-dir -r requirements.txt
- Update pip:
-
AWS credentials not found
- Verify environment variables are set
- Check AWS CLI configuration:
aws configure list
-
Permission denied (Linux/macOS)
- Make scripts executable:
chmod +x run.sh
- Make scripts executable:
-
Port already in use
- Change port in
main.pyor stop other services using port 8000
- Change port in
-
Audio processing errors
- Ensure FFmpeg is installed and accessible in PATH
- Try transcribing with a different audio format (MP3, WAV, FLAC)
- Check if the audio file is corrupted
Verification
Run the setup check script to verify everything is configured correctly:
python check_setup.pyFile Structure
whisper-file-service/
โโโ main.py # Main FastAPI application
โโโ requirements.txt # Python dependencies
โโโ check_setup.py # Setup verification script
โโโ run.bat # Windows startup script
โโโ run.sh # Linux/macOS startup script
โโโ README.md # This file
โโโ sample.mp3 # Sample audio file for testing
Dependencies
System Dependencies
- FFmpeg: Required for audio processing and format conversion
Python Dependencies
- torch: PyTorch for ML operations
- transformers: Hugging Face Transformers for Whisper
- fastapi: Web framework
- uvicorn: ASGI server
- boto3: AWS SDK for Python
- librosa: Audio processing (requires FFmpeg)
- soundfile: Audio file I/O
Support
For issues or questions:
- Check the troubleshooting section above
- Run
python check_setup.pyto verify setup - Check the logs when starting the service
License
This project is provided as-is for educational and development purposes.