danielsobrado/audio-processor
Audio processor, focused on english and arabic with diarization and summarization
Audio Processor
An audio processing application focused on transcription, diarization, summarization, and translation, with robust support for both English and Arabic languages. This project provides a scalable and efficient solution for handling various audio processing tasks.
Features
- Multilingual Transcription: Accurately transcribes audio in both English and Arabic.
- Speaker Diarization: Identifies and separates different speakers in an audio recording.
- Audio Summarization: Generates concise summaries from transcribed audio content.
- Translation: Translates transcribed content.
- Graph Database Integration: Neo4j-powered conversation analysis with speaker networks and topic flows.
- Speaker Network Analysis: Analyze interaction patterns, speaking time, and turn-taking behaviors.
- Topic Flow Tracking: Track conversation transitions and keyword-based topic extraction.
- Entity Extraction: Identify and link structured data (emails, phones, dates, URLs, mentions).
- Asynchronous Processing: Utilizes a job queue (Celery) for efficient handling of long-running audio processing tasks.
- RESTful API: Provides a clean and well-documented API for interacting with the service.
- Containerized Deployment: Docker and Kubernetes support for easy deployment and scaling.
- Database Integration: Stores job results and metadata.
Technologies Used
- Backend: Python 3.12+, FastAPI
- Audio Processing: WhisperX use Deepgram API compatible responses
- Task Queue: Celery, Redis (as broker and backend)
- Database: PostgreSQL (via SQLAlchemy)
- Graph Database: Neo4j (for conversation analysis)
- Migrations: Alembic
- Containerization: Docker
- Orchestration: Kubernetes
- Dependency Management: uv (fast Python package manager)
- Testing: Pytest with comprehensive test infrastructure
Setup and Installation
Follow these steps to set up the project locally.
Prerequisites
- Python 3.12+
- uv (recommended for dependency management, install from astral.sh/uv)
- Docker (for running services like Redis and PostgreSQL)
Note: This project has migrated from Poetry to uv for faster dependency management. If you have an existing setup with Poetry, you can migrate by running
uv sync --devafter installing uv.
1. Clone the repository
git clone https://github.com/xxxx/audio-processor.git
cd audio-processor2. Set up Environment Variables
Copy the example environment file and update it with your configurations.
cp .env.example .envEdit the .env file and fill in your keys. You might also want to adjust database or Redis connection settings if you're not using the default Docker setup.
3. Install Dependencies
Using uv (recommended):
uv sync --devIf you prefer pip:
pip install -r requirements.txt4. Install uv (if not already installed)
Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | shVerify installation:
uv --version5. Database Setup
Ensure Docker is running. Then, start the PostgreSQL and Redis containers using Docker Compose:
docker-compose -f deployment/docker/docker-compose.yml up -d db redisRun database migrations:
uv run alembic upgrade headIf you're not using uv:
alembic upgrade head6. Graph Database Setup (Optional)
The application includes optional Neo4j integration for conversation analysis. To enable graph functionality:
Start Neo4j with Docker:
docker run -d --name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/devpassword \
neo4j:5.15-communityEnable in configuration:
Edit your environment-specific YAML config file (e.g., config/development.yaml):
graph:
enabled: true
neo4j:
url: "bolt://localhost:7687"
username: "neo4j"
password: "devpassword"Access Neo4j Browser:
- URL: http://localhost:7474
- Username: neo4j
- Password: password
The graph functionality is completely optional and the application will work normally with graph.enabled: false.
Graph Visualization:
The Neo4j Browser provides built-in graph visualization capabilities. For custom visualizations, you can use the graph API endpoints to export data for tools like:
- D3.js for web-based visualizations
- Cytoscape.js for network analysis
- Gephi for advanced graph analytics
Running the Application
Locally (Development)
-
Start Services: Ensure PostgreSQL and Redis are running (e.g., via
docker-compose up -d db redis). -
Run FastAPI Application:
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
If you're not using uv:
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
The API will be available at
http://localhost:8000. -
Run Celery Worker: Open a new terminal and run:
uv run celery -A app.workers.celery_app worker -l info
If you're not using uv:
celery -A app.workers.celery_app worker -l info
This worker will process the audio tasks.
Using Docker Compose
For a full local deployment including the FastAPI app, Celery worker, Redis, and PostgreSQL:
docker-compose -f deployment/docker/docker-compose.yml up --build -dThis will build the Docker images and start all necessary services. The API will be available at http://localhost:8000.
Deployment on Kubernetes
The deployment/kubernetes directory contains YAML files for deploying the application to a Kubernetes cluster.
-
Build and Push Docker Images:
You'll need to build your Docker image and push it to a container registry accessible by your Kubernetes cluster.docker build -t your-registry/audio-processor:latest -f deployment/docker/Dockerfile . docker push your-registry/audio-processor:latestRemember to update the
imagefield indeployment/kubernetes/deployment.yamlto point to your image. -
Apply Kubernetes Manifests:
kubectl apply -f deployment/kubernetes/namespace.yaml kubectl apply -f deployment/kubernetes/configmap.yaml kubectl apply -f deployment/kubernetes/secrets.yaml # Ensure secrets are properly managed kubectl apply -f deployment/kubernetes/pvc.yaml kubectl apply -f deployment/kubernetes/deployment.yaml kubectl apply -f deployment/kubernetes/service.yaml kubectl apply -f deployment/kubernetes/ingress.yaml # If you have an Ingress controller kubectl apply -f deployment/kubernetes/hpa.yaml # For Horizontal Pod Autoscaling
Dependency Management
This project uses uv for fast and reliable dependency management.
Adding Dependencies
# Add a production dependency
uv add package-name
# Add a development dependency
uv add --dev package-name
# Add specific version
uv add "package-name>=1.0.0,<2.0.0"Managing Dependencies
# Install all dependencies (including dev)
uv sync --dev
# Install only production dependencies
uv sync --no-dev
# Update dependencies
uv sync
# Remove a dependency
uv remove package-name
# Show dependency tree
uv treeWhy uv?
- 10-100x faster than pip and poetry
- Better caching and dependency resolution
- Single tool for dependency management, virtual environments, and package building
- Drop-in replacement with familiar commands
- Cross-platform consistency
Development Commands
Quick Reference
# Setup project
uv sync --dev
# Start development server
uv run uvicorn app.main:app --reload
# Run tests
uv run pytest
# Code formatting and linting
uv run black .
uv run isort .
uv run flake8 .
uv run mypy .
# Database migrations
uv run alembic upgrade head
uv run alembic revision --autogenerate -m "description"
# Start Celery worker
uv run celery -A app.workers.celery_app worker -l infoAPI Endpoints
You can access the API documentation at http://localhost:8000/docs (Swagger UI) or http://localhost:8000/redoc (ReDoc) when the application is running.
Key endpoints include:
POST /api/v1/transcribe: Submits an audio file for transcription, diarization, summarization, and/or translation.GET /api/v1/status/{job_id}: Checks the status of a submitted job.GET /api/v1/results/{job_id}: Retrieves the results of a completed job.GET /api/v1/health: Health check endpoint.
Graph API Endpoints (when enabled):
GET /api/v1/graph/stats: Graph database statistics and connection status.GET /api/v1/graph/speakers: List all speakers and their interaction patterns.GET /api/v1/graph/topics: List all topics and their relationships.GET /api/v1/graph/conversations/{conversation_id}: Get complete conversation graph.GET /api/v1/graph/speakers/{speaker_id}/network: Get speaker's interaction network.GET /api/v1/graph/topics/{topic_id}/flow: Get topic flow and transitions.
Configuration
Configuration settings are managed through environment variables and YAML files located in the config/ directory.
.env: Local environment variables (sensitive data like API keys).config/: Contains environment-specific configurations (e.g.,development.yaml,production.yaml).
Feature Flags
The application includes comprehensive feature flags for runtime control of core functionality. All feature flags are properly implemented and enforced at the API boundary.
Available Feature Flags
| Feature Flag | Environment Variable | Default | Description |
|---|---|---|---|
enable_audio_upload |
ENABLE_AUDIO_UPLOAD |
true |
Controls direct audio file uploads |
enable_url_processing |
ENABLE_URL_PROCESSING |
true |
Controls processing from URLs |
enable_translation |
TRANSLATION_ENABLED |
true |
Controls translation functionality |
enable_summarization |
ENABLE_SUMMARIZATION |
true |
Controls summarization functionality |
graph.enabled |
GRAPH_ENABLED |
true |
Controls Neo4j graph features |
auth.verify_signature |
JWT_VERIFY_SIGNATURE |
true |
Controls JWT signature verification |
auth.verify_audience |
JWT_VERIFY_AUDIENCE |
true |
Controls JWT audience verification |
Feature Flag Implementation
API Enforcement: All feature flags are enforced at the API boundary in app/api/v1/endpoints/transcribe.py:
# Example: Audio upload blocking
if file and not settings.enable_audio_upload:
raise HTTPException(
status_code=403,
detail="Direct audio file uploads are currently disabled."
)
# Example: Translation blocking
if translate and not settings.translation.enabled:
raise HTTPException(
status_code=400,
detail="Translation feature is currently disabled."
)Environment Configuration: Disable features via environment variables:
# Disable file uploads
export ENABLE_AUDIO_UPLOAD=false
# Disable translation
export TRANSLATION_ENABLED=false
# Disable summarization
export ENABLE_SUMMARIZATION=falseTest Coverage: Feature flags are thoroughly tested in tests/unit/test_feature_flags.py with 341 lines of comprehensive test coverage.
Production Usage: Feature flags are production-ready and can be safely used to control system behavior in real-time.
Testing
Using Test Scripts (Recommended)
Windows:
# First time setup
scripts\setup-tests.bat
# Run all tests
scripts\run-tests.bat
# Run with coverage
scripts\run-tests.bat coverage
# Quick tests only
scripts\test-quick.batLinux/WSL:
# First time setup
./scripts/setup-tests.sh
# Run all tests
./scripts/run-tests.sh
# Run with coverage
./scripts/run-tests.sh coverage
# Quick tests only
./scripts/test-quick.shPowerShell:
# Run all tests
.\scripts\run-tests.ps1
# Run with coverage
.\scripts\run-tests.ps1 coverageDirect Commands
Using uv:
uv run pytestWith coverage:
uv run pytest --cov=app --cov-report=htmlIf you're not using uv:
pytestTest Documentation
See tests/TESTING.md for comprehensive testing documentation including:
- Test environment setup
- Available test types (unit, integration, coverage)
- Troubleshooting guide
- Cross-platform compatibility
Troubleshooting
Common Issues
-
uv not found: Install uv from astral.sh/uv
-
Permission denied (Linux/WSL): Make scripts executable:
chmod +x scripts/*.sh -
Tests failing: Ensure environment variables are set:
cp .env.example .env.test # Edit .env.test with test-specific values -
Dependencies not installing: Try clearing cache and reinstalling:
uv clean uv sync --dev
-
Port already in use: Change the port in your .env file or kill the process:
# Find process using port 8000 lsof -i :8000 # macOS/Linux netstat -ano | findstr :8000 # Windows
Performance Tips
- Use
uv runfor consistent dependency management - Keep
.env.testfor isolated test environments - Use Docker for external services (Redis, PostgreSQL)
- Monitor Celery worker logs for task processing issues
Getting Help
- Check logs:
docker-compose logsfor containerized services - Verify environment: Run environment tests with
scripts/run-tests.sh env - Review documentation: See
tests/TESTING.mdfor testing details
Contributing
Contributions are welcome! Please follow standard GitHub practices: fork the repository, create a new branch, commit your changes, and open a pull request.
License
This project is licensed under the MIT License - see the LICENSE file for details.