Robust scaling of a DataSet
RobustScale transforms a DataSet so each dimension in the data meets two criteria:
- it has a median of 0 and
- the range between two specified percentiles is 1 (these are often the 25th and 75th percentiles).
Because RobustScale is based on percentiles, the scaling of the data is less affected by extreme outliers than other scalers (such as Standardize and Normalize). Being more robust to outliers can help RobustScale more accurately scale and represent the majority of the data.
To compare the three scalers found in FluCoMa, visit Comparing Scalers.
These parameters (indicated as percentiles between 0 and 100) specify what percentage of the data on each extrema is not considered when determining how to scale the data. When set to the defaults of 25 and 75 (also known as the interquartile range), the values in the top and bottom quartile will have no affect on how the data gets scaled, so even if there are some extreme outliers in this dimension, the scaling that occurs will only reflect how the data in the two inner quartiles are positioned. Because this is strategic for preventing outliers from affecting the scaling of data, if you know what percentiles outliers occur at, you might adjust the
low parameters accordingly.