Bundling time by smoothing windows of data.
The shape of data as it changes over time is incredibly sensitive and important in how we and the computer interpret it. Often, problems such as noise (my data is very wiggly), statistical outliers (points of data that might seem outlandish) pollute our data and make it seem unwieldy to use. That said, these values which might look anomalous can also be true representations of what was analysed, particularly in the quantitative manner that computer’s perform their work. One reason for this is that audio analysis is incredibly granular and we, the human in the loop, don’t necessarily grasp why a particularly large, small or weird set of numbers might enter some of our analysis. A common theme that we’ve observed while workshopping with the tools is the expectation that analysis is somehow truthful to listening. This is not always true – signals are not sounds and analysis in the computer is more dissimilar to human perception than we would like to admit.
Now, this isn’t intended to scare you away from the toolkit, or machine listening from the get go, rather, to be honest about the assumptions we make when we sit down and music our way through some data. There are many simple and complex techniques we can leverage to treat, filter and sanitise data so that time becomes better represented in the analysis and perhaps, we hope, aligns better with how we interpret a sound.
To start with an approachable and tractable technique I’ll introduce the notion of smoothing. When someone says they’ve smoothed their data, they mean that they have found a way to remove irregularities or inconsistencies from it. There is no singular way to do this. One efficient, understandable and effective way that you can smooth some data is to calculate a moving average. To do this, contiguous blocks or windows of data are grouped together and averaged. The average of each window replaces the values. The widget below shows how calculating the moving average affects the nature of the data. You’ll notice as you increase or decrease the window size, the smoothness of the data becomes more jagged or more smooth respectively. The grey plot represents the original data, so that you can compare it smoothed and raw.
Why is this useful? Well, smoothing can often provide a straightforward improvement to data that is noisy or otherwise “erratic”. For example, imagine a scenario where we want to analyse the timbre of a drum loop in order to differentiate different drum hits. To do this, you would send a signal into the MFCC object, and store the MFCC data at different moments in a DataSet. However, MFCC data is produced every spectral frame. With the default FFT settings (1024 -1 -1) and at 44100 Hz sample rate this is around every 11ms! The timbre of a drum kit changes quite fast as different percussion instruments are hit by the player. This means that as we “snapshot” different moments in time we might be capturing these transitory and instantaneous frames, rather than a general indication of the timbral shape encoded in the MFCC data. Experiment with the widget below by playing with the amount of smoothing applied realtime MFCC analysis.
Playing with the above widget you might have noticed that trade-offs are made with smoothing amounts at the extrema of the slider. If you have no smoothing (0), the data is very wiggly or noisy. If you wanted to snapshot moments in time to build a set of classifier data (such as in the classifier tutorial), at any given point you may be capturing MFCC data which is silence between hits, the tail end of a crash cymbal or some other anomalous spectral frame. However, if you smooth the data too much, you’ll notice that the data barely changes. The differentiable aspects of the data end up being diluted because the window of time that is being analysed ends up regressing to the mean. This is the crux of time resolution. You want to smooth enough that you mitigate against fast erroneous changes in data, but not so much that you end up filtering out the characteristics of the data that end up providing meaningful differentiable information.