Harmonic Percussive Source Separation

A cheapish way of decomposing a signal into tonal and percussive parts.

HPSS works by using median filters on a spectrogram to try and separate a sound into two layers. Because it is quite a simple algorithm that is easy to perform free of artefacts (at the cost of less separation) it can be used both as a decomposition in its own right, or as a pre-processing step for some further analysis.

The steps are

  1. Take an STFT of the signal. Below ia a short bit of guitar, where we can see the harmonics as horizontal bands, and two onsets as vertical bands:
    Original Spectrogram
  2. Run a median filter across the each vertical strip of the magnitude spectrogram, i.e across the frequency bins. This filter smooths the spectrum out, suppressing harmonic peaks and giving a higher overall energy for that time slice when the it is dominated by broadband energy from a transient:
  3. Run median filters across each horizontal strip of the original spectrogram, i.e. across each channel in time. This will smooth out the brief fluctuations in time due to transients, whilst preserving the longer-term energy from harmonics:
  4. The two filter outputs are then normalised with respect to each other, so that for each time-frequency cell, the sum of the two filters is 1. This is so that when we use these as masks against the original spectrum, each original part of the sound will be wholly account for by the combination of our masks. The normalised masks are then each multiplied against the original complex spectral data, so that we have phase information as well as amplitude.



You can experiment with this technique in the Fluid Deocompsition Toolkit using fluid.hpss (real-time) and fluid.bufhpss (offline).