Whisper Transcription API Service

A FastAPI-based service that transcribes audio files from AWS S3 using OpenAI's Whisper model. The service downloads audio files from S3, transcribes them using Whisper, and uploads the transcription results back to S3.

Features

🎵 Transcribe audio files from AWS S3
🚀 High-performance Whisper model with GPU acceleration (if available)
☁️ Direct S3 integration for input and output
🔄 RESTful API with FastAPI
🖥️ Cross-platform support (Windows, Linux, macOS)
📊 Health check endpoint
🔧 Setup verification script

Requirements

Python 3.8 or higher
FFmpeg (required for audio processing)
AWS credentials configured
Audio files stored in AWS S3

Installation

Prerequisites

Before installing, ensure you have:

Python 3.8+ installed
FFmpeg installed (required for audio processing)
pip package manager
AWS account with S3 access
AWS credentials configured

Quick Setup

Clone or download the project files
Navigate to the project directory
Follow the platform-specific instructions below

Platform-Specific Installation

🪟 Windows

Step 1: Install FFmpeg

Option A: Using Chocolatey (Recommended)

# Install Chocolatey if not already installed (run as Administrator)
# Visit https://chocolatey.org/install for full instructions

# Install FFmpeg
choco install ffmpeg

Option B: Manual Installation

Download FFmpeg from: https://ffmpeg.org/download.html#build-windows
Extract the zip file to a folder (e.g., C:\ffmpeg)
Add C:\ffmpeg\bin to your system PATH:
- Open System Properties → Advanced → Environment Variables
- Edit the PATH variable and add the FFmpeg bin directory
- Restart command prompt

Option C: Using Winget

# Install using Windows Package Manager
winget install ffmpeg

Step 2: Install Python Dependencies

# Install required packages
pip install -r requirements.txt

Step 3: Configure AWS Credentials

Choose one of these methods:

Method A: Environment Variables (Recommended)

# Set environment variables (replace with your actual credentials)
set AWS_ACCESS_KEY_ID=your_access_key_here
set AWS_SECRET_ACCESS_KEY=your_secret_key_here
set AWS_DEFAULT_REGION=us-east-1

Method B: AWS CLI Configuration

# Install AWS CLI if not already installed
pip install awscli

# Configure AWS credentials
aws configure

Step 4: Verify Setup

# Run the setup verification script
python check_setup.py

Step 5: Start the Service

# Option 1: Use the batch script
run.bat

# Option 2: Run directly
python main.py

🐧 Linux

Step 1: Install FFmpeg

Ubuntu/Debian:

# Update package list
sudo apt update

# Install FFmpeg
sudo apt install ffmpeg

CentOS/RHEL/Fedora:

# For CentOS/RHEL (with EPEL repository)
sudo yum install epel-release
sudo yum install ffmpeg

# For Fedora
sudo dnf install ffmpeg

Arch Linux:

# Install FFmpeg
sudo pacman -S ffmpeg

Step 2: Install Python Dependencies

# Install required packages
pip install -r requirements.txt

# Alternative: Use pip3 if needed
pip3 install -r requirements.txt

Step 3: Configure AWS Credentials

Choose one of these methods:

Method A: Environment Variables (Recommended)

# Set environment variables (replace with your actual credentials)
export AWS_ACCESS_KEY_ID=your_access_key_here
export AWS_SECRET_ACCESS_KEY=your_secret_key_here
export AWS_DEFAULT_REGION=us-east-1

# To make permanent, add to ~/.bashrc or ~/.profile
echo 'export AWS_ACCESS_KEY_ID=your_access_key_here' >> ~/.bashrc
echo 'export AWS_SECRET_ACCESS_KEY=your_secret_key_here' >> ~/.bashrc
echo 'export AWS_DEFAULT_REGION=us-east-1' >> ~/.bashrc
source ~/.bashrc

Method B: AWS CLI Configuration

# Install AWS CLI if not already installed
pip install awscli

# Configure AWS credentials
aws configure

Step 3: Make Scripts Executable

# Make the run script executable
chmod +x run.sh

Step 4: Verify Setup

# Run the setup verification script
python check_setup.py

Step 5: Start the Service

# Option 1: Use the shell script
./run.sh

# Option 2: Run directly
python main.py

🍎 macOS

Step 1: Install FFmpeg

Using Homebrew (Recommended):

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install FFmpeg
brew install ffmpeg

Using MacPorts:

# Install FFmpeg
sudo port install ffmpeg

Step 2: Install Python Dependencies

# Install required packages
pip install -r requirements.txt

# If using Homebrew Python, you might need:
pip3 install -r requirements.txt

Step 3: Configure AWS Credentials

Choose one of these methods:

Method A: Environment Variables (Recommended)

# Set environment variables (replace with your actual credentials)
export AWS_ACCESS_KEY_ID=your_access_key_here
export AWS_SECRET_ACCESS_KEY=your_secret_key_here
export AWS_DEFAULT_REGION=us-east-1

# To make permanent, add to ~/.zshrc (for zsh) or ~/.bash_profile (for bash)
echo 'export AWS_ACCESS_KEY_ID=your_access_key_here' >> ~/.zshrc
echo 'export AWS_SECRET_ACCESS_KEY=your_secret_key_here' >> ~/.zshrc
echo 'export AWS_DEFAULT_REGION=us-east-1' >> ~/.zshrc
source ~/.zshrc

Method B: AWS CLI Configuration

# Install AWS CLI if not already installed
pip install awscli

# Configure AWS credentials
aws configure

Step 3: Make Scripts Executable

# Make the run script executable
chmod +x run.sh

Step 4: Verify Setup

# Run the setup verification script
python check_setup.py

Step 5: Start the Service

# Option 1: Use the shell script
./run.sh

# Option 2: Run directly
python main.py

Usage

Starting the Service

Once installed and configured, start the service:

Windows: run.bat or python main.py
Linux/macOS: ./run.sh or python main.py

The API will be available at: http://localhost:8000

API Endpoints

Health Check

GET http://localhost:8000/health

Transcribe Audio File

GET http://localhost:8000/transcribe?bucket=your-s3-bucket&fileKey=path/to/audio/file.mp3

Example Usage

# Health check
curl http://localhost:8000/health

# Transcribe an audio file
curl "http://localhost:8000/transcribe?bucket=my-audio-bucket&fileKey=recordings/meeting.mp3"

Configuration

Environment Variables

Variable	Description	Required
`AWS_ACCESS_KEY_ID`	AWS access key	Yes
`AWS_SECRET_ACCESS_KEY`	AWS secret key	Yes
`AWS_DEFAULT_REGION`	AWS region (e.g., us-east-1)	Optional

GPU Support

The service automatically detects and uses GPU acceleration if available:

NVIDIA GPUs: Requires CUDA-compatible PyTorch installation
CPU fallback: Automatic fallback to CPU if GPU unavailable

Troubleshooting

Common Issues

Python version error
- Ensure Python 3.8+ is installed
- Check with: python --version
FFmpeg not found
- Windows: Install via Chocolatey (choco install ffmpeg) or download manually
- Linux: Install via package manager (sudo apt install ffmpeg or similar)
- macOS: Install via Homebrew (brew install ffmpeg)
- Verify installation: ffmpeg -version
Package installation errors
- Update pip: pip install --upgrade pip
- Try: pip install --no-cache-dir -r requirements.txt
AWS credentials not found
- Verify environment variables are set
- Check AWS CLI configuration: aws configure list
Permission denied (Linux/macOS)
- Make scripts executable: chmod +x run.sh
Port already in use
- Change port in main.py or stop other services using port 8000
Audio processing errors
- Ensure FFmpeg is installed and accessible in PATH
- Try transcribing with a different audio format (MP3, WAV, FLAC)
- Check if the audio file is corrupted

Verification

Run the setup check script to verify everything is configured correctly:

python check_setup.py

File Structure

whisper-file-service/
├── main.py              # Main FastAPI application
├── requirements.txt     # Python dependencies
├── check_setup.py      # Setup verification script
├── run.bat             # Windows startup script
├── run.sh              # Linux/macOS startup script
├── README.md           # This file
└── sample.mp3          # Sample audio file for testing

Dependencies

System Dependencies

FFmpeg: Required for audio processing and format conversion

Python Dependencies

torch: PyTorch for ML operations
transformers: Hugging Face Transformers for Whisper
fastapi: Web framework
uvicorn: ASGI server
boto3: AWS SDK for Python
librosa: Audio processing (requires FFmpeg)
soundfile: Audio file I/O

Support

For issues or questions:

Check the troubleshooting section above
Run python check_setup.py to verify setup
Check the logs when starting the service

License

This project is provided as-is for educational and development purposes.

ScottAgirs/wt-v4

Whisper Transcription API Service

Features

Requirements

Installation

Prerequisites

Quick Setup

Platform-Specific Installation

🪟 Windows

Step 1: Install FFmpeg

Step 2: Install Python Dependencies

Step 3: Configure AWS Credentials

Step 4: Verify Setup

Step 5: Start the Service

🐧 Linux

Step 1: Install FFmpeg

Step 2: Install Python Dependencies

Step 3: Configure AWS Credentials

Step 3: Make Scripts Executable

Step 4: Verify Setup

Step 5: Start the Service

🍎 macOS

Step 1: Install FFmpeg

Step 2: Install Python Dependencies

Step 3: Configure AWS Credentials

Step 3: Make Scripts Executable

Step 4: Verify Setup

Step 5: Start the Service

Usage

Starting the Service

API Endpoints

Health Check

Transcribe Audio File

Example Usage

Configuration

Environment Variables

GPU Support

Troubleshooting

Common Issues

Verification

File Structure

Dependencies

System Dependencies

Python Dependencies

Support

License

On this page

Languages

Contributors