Sound Analysis, Synthesis and Processing - Homework 2

This project is the implementation of Homework #2 from the DAAP course (Space-Time-Based Source Separation). The goal is to separate multiple speech sources recorded with a two-microphone array using short-time Fourier transform (STFT)-based binary masking combined with clustering.

Two mixture signals (y1.wav, y2.wav) are provided, each containing a combination of three speech sources captured by two microphones placed 9 cm apart. The task is to design binary masks in the time-frequency domain to separate the three original sources (s1.wav, s2.wav, s3.wav).

The procedure involves:

Implementing a custom STFT (without using built-in stft/istft functions).
Extracting space-time feature vectors for each time-frequency bin, including normalized magnitudes from both microphones and the inter-microphone phase difference.
Applying clustering (e.g., k-means with k = 3) to group time-frequency bins by source.
Designing binary masks from the cluster assignments and applying them to the mixture’s STFT.
Reconstructing the estimated source signals via inverse STFT.

The project also includes generating spectrograms of mixtures, true sources, and separated sources, as well as visualizing binary masks and feature distributions. The final output is composed of the three separated signals as WAV files along with plots for analysis.

Existing clustering libraries are used (e.g., scikit-learn), but the STFT implementation is coded from scratch.

Authors: Chiara Lunghi, Alice Portentoso

Chia2500/SASP_2

Sound Analysis, Synthesis and Processing - Homework 2

On this page

Contributors