Detect outliers in a rolling window using the Hampel identifier.
hampel_outlier(x, k, threshold = 3.5)
A vector of numbers.
Width of the rolling window (an odd integer).
Threshold for labeling outliers. For normally distributed data this is equivalent to standard deviations.
A vector of boolean values, TRUE if the value is an outlier.
The Hampel identifier uses the median absolute deviation (MAD) and a threshold to identify outliers based on their distance from the median (Davies and Gather 1993) . This is a robust alternative to the commonly used thresholds \(mean \pm 3 \sigma\) to identify outliers (Leys et al. 2013) .
Values are classified as outliers when
$$\frac{\left| X_i - \textrm{med}(X) \right|}{\textrm{MAD}(X)} > threshold$$
When the MAD is zero this equation is undefined. In this case the function returns FALSE.
Davies L, Gather U (1993).
“The Identification of Multiple Outliers.”
Journal of the American Statistical Association, 88(423), 782--792.
ISSN 0162-1459, doi:10.1080/01621459.1993.10476339
, Publisher: Taylor & Francis _ eprint: https://www.tandfonline.com/doi/pdf/10.1080/01621459.1993.10476339, https://www.tandfonline.com/doi/abs/10.1080/01621459.1993.10476339.
Leys C, Ley C, Klein O, Bernard P, Licata L (2013).
“Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median.”
Journal of Experimental Social Psychology, 49(4), 764--766.
ISSN 0022-1031, doi:10.1016/j.jesp.2013.03.013
, http://www.sciencedirect.com/science/article/pii/S0022103113000668.
# test a dataset with an outlier
x <- rnorm(20)
x[3] <- 10
hampel_outlier(x, 5)
#> [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE