sayedmahmoud266/quran-ai-transcriping
Quran AI transcriping with accurate Ayah, Surah Matching with audio timestamps
Experimental Quran AI Transcription
Note: This repository is currently an experimental implementation using Whisper for Quran transcription. It will soon be archived as I'm working on a new version that:
- Uses Google's Gemini for far better accuracy
- Processes results significantly faster
- Provides a more robust and maintainable codebase
The new version will be available in a separate repository.
A production-ready Python API for transcribing Quran recitations from audio files with 100% verse detection accuracy. Uses advanced constraint propagation algorithms and the fine-tuned tarteel-ai/whisper-base-ar-quran model.
🌟 Key Features
Advanced Verse Matching (v2.0.0) ⭐ NEW
- Constraint Propagation Algorithm: Multi-batch analysis for accurate surah identification
- Backward Gap Filling: Automatically detects and fills missing ayahs
- Forward Consecutive Matching: Handles repeated phrases and long surahs
- 100% Accuracy: Tested on multiple surahs (97, 55) with perfect detection
- PyQuran Integration: 6,236 verses with full tashkeel support
Audio Processing
- Multi-format Support: MP3, WAV, M4A, WMA, AAC, FLAC, OGG, OPUS, WebM
- Intelligent Chunking: Silence-based audio splitting for optimal accuracy
- High-Quality Resampling: Kaiser-best algorithm for 16kHz conversion
- MP3 Optimization: Uses pydub for reliable MP3 loading (fixes truncation issues)
Performance
- Fast Processing: ~1 second per minute of audio
- GPU Acceleration: Automatic CUDA support when available
- 85-95% Coverage: High text coverage with minimal trailing time
- Detailed Diagnostics: Coverage metrics, timestamps, confidence scores
API Features
- RESTful Design: Simple HTTP API built with FastAPI
- Comprehensive Output: Transcription, verse details, timestamps, diagnostics
- Error Handling: Robust error handling and validation
- Auto-reload: Development mode with hot-reload support
Requirements
- Python 3.8 or higher
- Virtual environment support
- At least 2GB RAM (4GB+ recommended)
- GPU support optional (CUDA-compatible GPU for faster processing)
Installation
1. Clone or navigate to the repository
cd /path/to/tarteel-ai_whisper-base-ar-quran2. Run the setup script
chmod +x setup.sh
./setup.shThis will:
- Create a virtual environment
- Install all required dependencies
- Set up the project for running
3. Activate the virtual environment
source venv/bin/activateUsage
Starting the Server
Option 1: Using Make (recommended) ⭐
make startThis is the simplest way to start the server. The Makefile handles virtual environment activation and server startup automatically.
Option 2: Using the run script
chmod +x run.sh
./run.shOption 3: Manual start
source venv/bin/activate
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadThe API will be available at http://localhost:8000
Other Make Commands
make help # Show all available commands
make setup # Install dependencies and setup virtual environment
make install # Alias for setup
make dev # Start in development mode with debug logging
make test # Run tests (placeholder for future tests)
make clean # Clean up temporary files and cache
make clean-all # Clean everything including virtual environment
make freeze # Generate requirements.txt from current environment
make check # Check if all dependencies are installed
make logs # Show recent server logs
make info # Show project informationRun make or make help to see all available commands with descriptions.
API Endpoints
1. Root Endpoint
GET /
Returns API information and available endpoints.
2. Health Check
GET /health
Returns the health status of the API and model information.
3. Transcribe Audio (Sync)
POST /transcribe
Parameters:
audio_file(file, required): Audio file containing Quran recitation
Supported Audio Formats:
- MP3 (.mp3)
- WAV (.wav)
- M4A (.m4a)
- WMA (.wma)
- AAC (.aac)
- FLAC (.flac)
- OGG (.ogg)
- OPUS (.opus)
- WebM (.webm)
Response Format (Single Verse):
{
"success": true,
"data": {
"exact_transcription": "بسم الله الرحمن الرحيم",
"details": [
{
"surah_number": 1,
"ayah_number": 1,
"ayah_text_tashkeel": "بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ",
"ayah_word_count": 4,
"start_from_word": 1,
"end_to_word": 4,
"audio_start_timestamp": "00:00:00.000",
"audio_end_timestamp": "00:00:03.500",
"match_confidence": 0.95,
"is_basmala": true
}
]
}
}Response Format (Multiple Consecutive Verses):
{
"success": true,
"data": {
"exact_transcription": "بسم الله الرحمن الرحيم الحمد لله رب العالمين",
"details": [
{
"surah_number": 1,
"ayah_number": 1,
"ayah_text_tashkeel": "بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ",
"ayah_word_count": 4,
"start_from_word": 1,
"end_to_word": 4,
"audio_start_timestamp": "00:00:00.000",
"audio_end_timestamp": "00:00:02.000",
"match_confidence": 0.95,
"is_basmala": true
},
{
"surah_number": 1,
"ayah_number": 2,
"ayah_text_tashkeel": "الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ",
"ayah_word_count": 4,
"start_from_word": 1,
"end_to_word": 4,
"audio_start_timestamp": "00:00:02.000",
"audio_end_timestamp": "00:00:04.500",
"match_confidence": 0.92
}
]
}
}Notes:
match_confidenceindicates how well the transcription matches the identified verse (0.0 to 1.0, where 1.0 is perfect match)is_basmala: trueindicates this is the Basmala (بسم الله الرحمن الرحيم)- For Surah 1 (Al-Fatiha), Basmala has
ayah_number: 1 - For other surahs, Basmala has
ayah_number: 0(not officially numbered) - Multiple consecutive ayahs are automatically detected and returned
4. Async Transcription
POST /transcribe/async
GET /jobs/{job_id}/status
GET /jobs/{job_id}/download
GET /jobs/{job_id}/metadata
GET /jobs
For long-running transcriptions, use the async API. See ASYNC_API.md for details.
5. Job Management
POST /jobs/resume
DELETE /jobs/finished
Resume Job Queue: Restart any jobs that are still in processing status (useful after server restart).
Clear Finished Jobs: Delete all completed/failed jobs from database and remove their files.
See JOB_MANAGEMENT.md for details.
Example Usage
Using cURL
curl -X POST "http://localhost:8000/transcribe" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "audio_file=@/path/to/quran_recitation.mp3"Using Python requests
import requests
url = "http://localhost:8000/transcribe"
files = {"audio_file": open("quran_recitation.mp3", "rb")}
response = requests.post(url, files=files)
print(response.json())Using JavaScript/Fetch
const formData = new FormData();
formData.append('audio_file', audioFile);
fetch('http://localhost:8000/transcribe', {
method: 'POST',
body: formData
})
.then(response => response.json())
.then(data => console.log(data));📊 Performance Metrics
| Metric | Value |
|---|---|
| Accuracy | 100% (tested on Surah 97 & 55) |
| Coverage | 85-95% of transcribed text |
| Processing Speed | ~1 second per minute of audio |
| Trailing Time | <1 minute (down from 5+ minutes) |
| Memory Usage | ~650 MB |
Test Results
Surah 97 (Al-Qadr) - Short Surah
- ✅ 6/6 ayahs detected (100%)
- ✅ Confidence: 87.5% - 100%
- ✅ Trailing time: 7.3 seconds
Surah 55 (Ar-Rahman) - Long Surah with Repeated Phrases
- ✅ 78/78 ayahs detected (100%)
- ✅ Confidence: 82% - 100%
- ✅ Trailing time: 51 seconds
- ✅ Handles 31 repetitions of "فَبِأَيِّ آلَاءِ رَبِّكُمَا تُكَذِّبَانِ"
📁 Project Structure
.
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── audio_processor.py # Audio processing (pydub, librosa)
│ ├── transcription_service.py # Whisper transcription service
│ └── quran_data.py # Verse matching algorithms
├── docs/
│ ├── ALGORITHM.md # Complete algorithm documentation
│ ├── PROJECT_STATUS.md # Project status and roadmap
│ ├── .diagrams/ # PlantUML diagrams
│ │ ├── src/ # Source .puml files
│ │ └── images/ # Rendered PNG diagrams
│ └── [legacy docs...] # Previous documentation
├── .gitignore
├── requirements.txt # Python dependencies
├── makefile # Make commands (start, setup, clean)
├── setup.sh # Setup script
├── run.sh # Run script
├── LICENSE # MIT License
└── README.md # This file
📖 Documentation
- ALGORITHM.md - Complete technical documentation with diagrams
- ASYNC_API.md - Async API documentation for background job processing
- JOB_MANAGEMENT.md - Job management APIs (resume queue, clear jobs)
- PROJECT_STATUS.md - Project status, metrics, and roadmap
- Diagrams - PlantUML source files and rendered images
Model Information
This API uses the tarteel-ai/whisper-base-ar-quran model, which is a fine-tuned version of OpenAI's Whisper model specifically optimized for Quran recitations.
Model Performance:
- WER (Word Error Rate): 5.75%
- Validation Loss: 0.0839
For more details, see docs/model_readme.md
Development
Running in Development Mode
The API runs with auto-reload enabled by default when using the run script, which means changes to the code will automatically restart the server.
Quran Verse Matching
The application now includes full Quran verse matching:
- Automatic Download: Quran text is downloaded from GitHub on first run
- Local Caching: Quran data is cached locally for faster subsequent loads
- Fuzzy Matching: Uses Levenshtein distance for accurate verse identification
- Chunk-based Detection: Uses audio chunk boundaries as hints for verse breaks
- Confidence Scores: Returns match confidence for each identified verse
The Quran data is loaded from quran-json and cached in quran_simple.txt.
Performance Optimization
- GPU Acceleration: The API automatically uses GPU if available (CUDA)
- Model Caching: The model is loaded once at startup and reused for all requests
- Async Processing: FastAPI handles requests asynchronously for better throughput
- Smart Chunking: Audio is split by silence detection for better accuracy and memory efficiency
For more details on the chunking implementation, see docs/chunking_implementation.md
Troubleshooting
Model Download Issues
If the model fails to download, ensure you have:
- Stable internet connection
- Sufficient disk space (~500MB for the model)
- Access to Hugging Face (not blocked by firewall)
Audio Processing Errors
If audio processing fails:
- Ensure the audio file is not corrupted
- Check that the file format is supported
- Verify the file is not empty
Memory Issues
If you encounter out-of-memory errors:
- Close other applications to free up RAM
- Consider using a smaller batch size
- Use GPU if available
🔬 Algorithm Overview
The system uses a sophisticated constraint propagation algorithm for verse matching:
- Constraint Propagation: Analyzes multiple word batches and intersects results to identify the correct surah
- Backward Gap Filling: Detects and fills missing ayahs before the identified starting point
- Forward Consecutive Matching: Continues matching with tolerance for repeated phrases
For complete technical details, see docs/ALGORITHM.md.
⚠️ Important: Recent Refactoring (2025-10-08)
Major architectural changes have been made to improve accuracy and simplicity:
What Changed
- Removed Word Timestamps: Eliminated inaccurate linear interpolation of word timestamps
- Silence-Based Boundaries: Now relies solely on naturally detected silence boundaries from audio chunks (step 03_chunks_merged)
- Uthmani Text: Uses full Uthmani tashkeel from
res/quran-uthmani_all.txt - Direct Chunk Mapping: Simplified chunk mapping to use direct timestamp overlap instead of fuzzy text matching
Why This Matters for Future Development
- No Word-Level Timestamps: The API no longer returns
word_timestampsin the response - Chunk Boundaries are Truth: All timing relies on pydub's silence detection (step 03_chunks_merged)
- Simplified Architecture: ~200 lines of complex logic removed for better maintainability
- Uthmani Text Source: Always use
quran_data.get_verse_with_tashkeel()for ayah text
Future Word Timestamp Implementation
When adding accurate word timestamps back:
- Use Whisper's built-in
return_timestamps='word'parameter - Or integrate forced alignment tools (wav2vec2, Montreal Forced Aligner)
- Do NOT use linear interpolation (time_per_word = duration / word_count)
See REFACTORING_SUMMARY.md for complete details.
🚀 Future Enhancements
- Multi-surah detection (continuous recitations)
- Partial ayah support (start/end word tracking)
- Web UI for testing
- Reciter identification
- Real-time streaming transcription
- Support for different Qira'at (Hafs, Warsh, etc.)
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
The Whisper model is licensed under Apache 2.0 by OpenAI.
🙏 Acknowledgments
- Model: tarteel-ai/whisper-base-ar-quran by Tarteel AI
- Base Model: OpenAI Whisper
- Framework: FastAPI
- Quran Data: PyQuran v1.0.1
- Audio Processing: pydub, librosa
- Fuzzy Matching: RapidFuzz
🤝 Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📧 Support
For issues or questions:
- Open an issue on GitHub
- Check the documentation
🌟 Star History
If you find this project useful, please consider giving it a star! ⭐
Repository: https://github.com/sayedmahmoud266/quran-ai-transcriping
Version: 2.0.0
Last Updated: October 2025
