M-
m-aparna108/Audio_Analysis
Audio analysis of English podcast clips using Hugging Face dataset. Includes preprocessing, feature extraction (MFCC), and exploratory data analysis (waveforms, Mel spectrograms, amplitude stats).
Audio Analysis: Automated Speech Data Exploration
This project demonstrates audio analysis using a small English podcast dataset from Hugging Face. The main goal is to understand how audio data is loaded, preprocessed, and analyzed before applying ML or speech recognition techniques.
Dataset
Project Overview
Steps Performed
-
Load Audio
- Dataset loaded using
datasetslibrary and decoded into numpy arrays.
- Dataset loaded using
-
Preprocessing
- Resample to 16 kHz
- Convert to mono
- Normalize amplitude
- Noise reduction
- Silence removal
- Voice Activity Detection (VAD)
- Framing and windowing
- Feature extraction (MFCC)
- Padding/trimming to fixed length (5 seconds)
- Data augmentation (demonstrated on first clip)
-
EDA (Exploratory Data Analysis)
- Audio duration distribution
- Waveform and Mel Spectrogram visualization
- Amplitude statistics (mean, max, min)
Libraries Used
datasets(Hugging Face)librosanumpymatplotlibtorchaudionoisereducewebrtcvadIPython.display
Results / Observations
- Preprocessing standardizes audio clips for further ML tasks.
- Mean amplitude across clips is near zero; some clips contain significant silence.
- Mel Spectrograms reveal frequency content of speech, useful for ASR or embedding tasks.
- The pipeline demonstrates how raw audio can be cleaned and transformed into model-ready features.
Usage
- Clone this repository.
- Open the notebook in Google Colab.
- Install required libraries
- Run cells step by step to explore audio preprocessing and EDA.
Key Takeaway
This notebook provides a hands-on understanding of audio preprocessing, feature extraction, and analysis for a beginner-friendly speech dataset. It prepares the audio data for downstream tasks like speech recognition, embedding extraction, and topic segmentation.
On this page
Languages
Jupyter Notebook100.0%
Contributors
MIT License
Created December 12, 2025
Updated December 12, 2025