GitHunt
DH

Dhravya/tem-entrepreneurship-search

Audio Search System

This system provides a workflow to process, transcribe, section, and search audio content using OpenAI's Whisper model.

Setup and Workflow

1. Prepare Audio Files

Place all audio files in the input folder.

2. Compress Large Audio Files

Run the compression script to process any files larger than 25 MB:

python compress.py

This will compress files over 25 MB and save the compressed versions in the input_compressed folder.

3. Generate Transcriptions

Generate subtitle transcripts from audio files using OpenAI's Whisper model:

python transcript.py

Transcripts will be saved in the transcripts folder.

4. Create Logical Sections

Process the transcripts to divide them into logical sections:

python section.py

The sectioned content will be stored in the sections folder.

5. Query Content

Search through the processed content with natural language queries:

python query.py "your search query here"

Example:

python query.py "favor system mechanics"

If you've made changes to the sections folder, regenerate the search index when querying:

python query.py --regenerate-index "your search query here"

Folder Structure

  • input/: Raw audio files
  • input_compressed/: Compressed audio files (< 25 MB)
  • transcripts/: Whisper-generated transcriptions
  • sections/: Logical sections extracted from transcripts