Audio Search System

This system provides a workflow to process, transcribe, section, and search audio content using OpenAI's Whisper model.

Setup and Workflow

Place all audio files in the input folder.

Run the compression script to process any files larger than 25 MB:

python compress.py

This will compress files over 25 MB and save the compressed versions in the input_compressed folder.

Generate subtitle transcripts from audio files using OpenAI's Whisper model:

python transcript.py

Transcripts will be saved in the transcripts folder.

Process the transcripts to divide them into logical sections:

python section.py

The sectioned content will be stored in the sections folder.

Search through the processed content with natural language queries:

python query.py "your search query here"

Example:

python query.py "favor system mechanics"

If you've made changes to the sections folder, regenerate the search index when querying:

python query.py --regenerate-index "your search query here"