Removing Outliers with BufStats
Prevent outliers from negatively impacting BufStats' statistical summary.
For a consideration of what might constitute an outlier and more ideas about how to manage them, visit the page on Outliers
BufStats can find and remove outliers by using an analysis of the data to set boundaries for each channel in the buffer. Any frames that have a value outside the its channel’s boundaries will not be used to compute the statistics. The strictness of these boundaries is determined by outliersCutoff
(see below for how this parameter is used). Removing these outliers before computing the statistics will prevent them from affecting the statistical summary so the output of BufStats is a better representation of the majority of the data.
The boundaries of each channel are computed using the interquartile range (IQR), thereby ensuring that the boundaries are relative to the scale and distribution of values. The lower bound of this range is 25th percentile - (IQR * outliersCutoff)
. The upper bound of this range is 75th percentile + (IQR * outliersCutoff)
. The 25th and 75th percentiles are also called “Q1” and “Q3” respectively, short for the 1st Quartile and 3rd Quartile.
The default of -1 bypasses this function, using all the frames in the statistical measurements.
An example
To demonstrate how this works, consider this output of a SpectralShape analysis (columns are frames and rows are channels of the features
buffer):
FFT Frame 1 | FFT Frame 2 | FFT Frame 3 | FFT Frame 4 | FFT Frame 5 | FFT Frame 6 | FFT Frame 7 | FFT Frame 8 | |
---|---|---|---|---|---|---|---|---|
Centroid | 3001.34 | 2347.71 | 2087.17 | 2217.7 | 2282.62 | 2425.79 | 2655.61 | 2607.8 |
Spread | 3182.74 | 2802.39 | 2832.76 | 3051.99 | 3180.78 | 3302.01 | 3462.62 | 3424.58 |
Skewness | 1.81 | 2.55 | 2.98 | 2.68 | 2.63 | 2.45 | 2.18 | 2.22 |
Kurtosis | 6.99 | 11.58 | 13.5 | 11.13 | 10.71 | 9.67 | 8.21 | 8.26 |
Rolloff | 9615.96 | 7792.53 | 8347.06 | 9182.46 | 9491.74 | 9785.22 | 10178.95 | 10112.94 |
Flatness | -14.93 | -17.33 | -17.35 | -16.44 | -15.56 | -14.78 | -14 | -14.18 |
Crest | 23.67 | 31.62 | 32.73 | 32.57 | 33.77 | 34.62 | 35.73 | 35.55 |
First, BufStats will find the Q1 and Q3. Using these it will calculate the interquartile range (IQR) which is the difference between these two values (Q3 - Q1). Next a “Margin” is calculated as IQR * outliersCutoff
(in this example, outliersCutoff
= 1.1). Finally, the lower and upper bounds are calculated as a “Margin distance” below Q1 and above Q3. lower bound = Q1 - Margin
, upper bound = Q3 + Margin
.
Q1 | Q3 | IQR | Margin | Lower Bound | Upper Bound | |
---|---|---|---|---|---|---|
Centroid | 2282.62 | 2607.8 | 325.18 | 357.7 | 1924.92 | 2965.49 |
Spread | 3051.99 | 3302.01 | 250.02 | 275.02 | 2776.96 | 3577.03 |
Skewness | 2.22 | 2.63 | 0.41 | 0.45 | 1.77 | 3.08 |
Kurtosis | 8.26 | 11.13 | 2.87 | 3.15 | 5.11 | 14.28 |
Rolloff | 9182.46 | 9785.22 | 602.76 | 663.03 | 8519.43 | 10448.26 |
Flatness | -16.44 | -14.78 | 1.66 | 1.83 | -18.27 | -12.95 |
Crest | 32.57 | 34.62 | 2.05 | 2.25 | 30.32 | 36.87 |
Now, using the lower and upper bounds, BufStats checks the original values in the buffer to see if any fall outside this range.
FFT Frame 1 | FFT Frame 2 | FFT Frame 3 | FFT Frame 4 | FFT Frame 5 | FFT Frame 6 | FFT Frame 7 | FFT Frame 8 | |
---|---|---|---|---|---|---|---|---|
Centroid | 3001.34 | 2347.71 | 2087.17 | 2217.7 | 2282.62 | 2425.79 | 2655.61 | 2607.8 |
Spread | 3182.74 | 2802.39 | 2832.76 | 3051.99 | 3180.78 | 3302.01 | 3462.62 | 3424.58 |
Skewness | 1.81 | 2.55 | 2.98 | 2.68 | 2.63 | 2.45 | 2.18 | 2.22 |
Kurtosis | 6.99 | 11.58 | 13.5 | 11.13 | 10.71 | 9.67 | 8.21 | 8.26 |
Rolloff | 9615.96 | 7792.53 | 8347.06 | 9182.46 | 9491.74 | 9785.22 | 10178.95 | 10112.94 |
Flatness | -14.93 | -17.33 | -17.35 | -16.44 | -15.56 | -14.78 | -14 | -14.18 |
Crest | 23.67 | 31.62 | 32.73 | 32.57 | 33.77 | 34.62 | 35.73 | 35.55 |
Frames 1, 2, and 3 all have values that fall outside the boundaries. Each of these three frames will be removed from what is used to compute the statistical summary leaving these frames:
FFT Frame 4 | FFT Frame 5 | FFT Frame 6 | FFT Frame 7 | FFT Frame 8 | |
---|---|---|---|---|---|
Centroid | 2217.7 | 2282.62 | 2425.79 | 2655.61 | 2607.8 |
Spread | 3051.99 | 3180.78 | 3302.01 | 3462.62 | 3424.58 |
Skewness | 2.68 | 2.63 | 2.45 | 2.18 | 2.22 |
Kurtosis | 11.13 | 10.71 | 9.67 | 8.21 | 8.26 |
Rolloff | 9182.46 | 9491.74 | 9785.22 | 10178.95 | 10112.94 |
Flatness | -16.44 | -15.56 | -14.78 | -14 | -14.18 |
Crest | 32.57 | 33.77 | 34.62 | 35.73 | 35.55 |
Here are the statistics these selected frames produce:
Mean | Std Dev | Skewness | Kurtosis | Low | Middle | High | |
---|---|---|---|---|---|---|---|
Centroid | 2437.91 | 172.63 | 0.03 | 1.35 | 2217.7 | 2425.79 | 2655.61 |
Spread | 3284.4 | 152.63 | -0.29 | 1.63 | 3051.99 | 3302.01 | 3462.62 |
Skewness | 2.43 | 0.21 | -0.06 | 1.3 | 2.18 | 2.45 | 2.68 |
Kurtosis | 9.6 | 1.21 | -0.01 | 1.31 | 8.21 | 9.67 | 11.13 |
Rolloff | 9750.26 | 375.7 | -0.28 | 1.6 | 9182.46 | 9785.22 | 10178.95 |
Flatness | -14.99 | 0.91 | -0.46 | 1.75 | -16.44 | -14.78 | -14 |
Crest | 34.45 | 1.17 | -0.43 | 1.78 | 32.57 | 34.62 | 35.73 |
Compared to the statistics if all the frames were included:
Mean | Std Dev | Skewness | Kurtosis | Low | Middle | High | |
---|---|---|---|---|---|---|---|
Centroid | 2453.22 | 272.89 | 0.67 | 2.58 | 2087.17 | 2425.79 | 3001.34 |
Spread | 3154.98 | 231.61 | -0.27 | 1.78 | 2802.39 | 3182.74 | 3462.62 |
Skewness | 2.44 | 0.34 | -0.32 | 2.44 | 1.81 | 2.55 | 2.98 |
Kurtosis | 10 | 2 | 0.15 | 2.04 | 6.99 | 10.71 | 13.5 |
Rolloff | 9313.36 | 790.44 | -0.79 | 2.32 | 7792.53 | 9615.96 | 10178.95 |
Flatness | -15.57 | 1.25 | -0.28 | 1.57 | -17.35 | -14.93 | -14 |
Crest | 32.53 | 3.62 | -1.65 | 4.66 | 23.67 | 33.77 | 35.73 |