Apply timeseries decomposition (and anomaly detection) over a sliding/tiled window

依然范特西╮ 提交于 2019-12-11 15:39:27

问题


Anomaly detection methods published and now abandoned by twitter have been separately forked and maintained in the anomalize package and the hrbrmstr/AnomalyDetection fork. Both have implemented features that are 'tidy'.

Working static versions

tidyverse_cran_downloads %>% 
  filter(package == "tidyr") %>% 
  ungroup() %>% 
  select(-package) -> one_package_only

one_package_only %>% 
  anomalize::time_decompose(count,
                 merge = TRUE,
                 method = "twitter",
                 frequency = "7 days") -> one_package_only_decomp

one_package_only_decomp %>%
  anomalize::anomalize(remainder, method = "iqr") %>%
  anomalize::time_recompose()


one_package_only_decomp %>% 
  select(date, remainder) %>%
  AnomalyDetection::ad_ts(max_anoms = 0.02,
        direction = 'both')

These work as expected.

I would like to apply the twitter anomaly detection process on a tiled window to my dataset, which is similar in structure to the anomalize::tidyverse_cran_downloads dataset. A regular set of over 100 observations of a value, grouped by a categorical definition.

The tsibble package (which replaces the old tibbletime) has a method to apply a function in a purrr-like syntax via slide,tile and stretch. This can include returning a full data-frame like object, inside another data-frame like object as per purrr. (What a sentence!)

I've gone through the window function vignette but haven't had much luck.

Attempt 1 slide2:

The anomalize::decompose_twitter function takes two arguments, data and target

tidyverse_cran_downloads %>%
  mutate(
    Monthly_MA = slide2_dfr(
      .x = .,
      .y = count,
      ~ anomalize::decompose_twitter,
      .size = 5
    )
  )

Error: Element 1 has length 3, not 1 or 425. Callrlang::last_error()to see a backtrace

Maybe I've misunderstood how the .x .y syntax works?

Attempt 2:pmap

my_diag <- function(...) {
  data <- tibble(...)
  fit <- anomalize::decompose_twitter(data = data, target = count)
}

tidyverse_cran_downloads %>%
  nest(-package) %>%
  filter(package %in% c("tidyr", "lubridate")) %>%  # just to make it quick
  mutate(diag = purrr::map(data, ~ pslide_dfr(., my_diag, .size = 7)))

Error in stats::stl(., s.window = "periodic", robust = TRUE) : series is not periodic or has less than two periods

Appears something is running, but the period between observations is off somehow or not getting parsed?

Attempt 3: ad_ts

ad_ts only takes one argument, so ignoring the fact that we have yet to find a way to calculate the remainder after decomposition, I should be able to use it via slide. It also expects it's x to be:

Time series as a two column data frame where the first column consists of the timestamps and the second column consists of the observations.

So we shouldn't have to do much to the data after it's nested.

tidyverse_cran_downloads %>%
  nest(-package, .key = "my_data") %>%
  mutate(
    Daily_MA = slide_dfr(
      .f = AnomalyDetection::ad_ts,
      .x = my_data
    )
  )

Error in .f(.x[[i]], ...) : data must be a single data frame.

So the function is at least being called, but it's being called by more than a single data frame?

I want to:

  • Apply a process of decomposition through the twitter algorithm, followed by anomaly detection on the remainder
  • Use one of the two anomaly detection packages to do it, or a blend of the two
  • Apply it to a window of time
  • Over grouped categorical data

The only way my data set differs is that I have half hourly observations of values over a period of multiple months, and I actually only need the anomalies recalculated each day (i.e. once every 48 observations), where the window looks back over the prior 30 days to decompose and detect them.

(N.B. I would have tagged tsibble and anomalize, but I don't have the rep to make those tags)


回答1:


Approach 2 should work as expected? The error message is related to the stl() that requires at least two seasonal periods to estimate. For example, daily data needs at least 14 observations for stl() to run. Increasing the window size .size = 7 * 3 works fine.

my_decomp <- function(...) {
  data <- tibble(...)
  anomalize::decompose_twitter(data, count)
}

library(dplyr)
library(anomalize)
tidyverse_cran_downloads %>%
  group_by(package) %>% 
  tidyr::nest() %>% 
  mutate(diag = purrr::map(data, ~ tsibble::pslide_dfr(., my_decomp, .size = 7 * 3)))
#> # A tibble: 15 x 3
#>    package   data               diag                
#>    <chr>     <list>             <list>              
#>  1 tidyr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  2 lubridate <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  3 dplyr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  4 broom     <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  5 tidyquant <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  6 tidytext  <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  7 ggplot2   <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  8 purrr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  9 glue      <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 10 stringr   <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 11 forcats   <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 12 knitr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 13 readr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 14 tibble    <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 15 tidyverse <tibble [425 × 2]> <tibble [8,506 × 5]>


来源:https://stackoverflow.com/questions/56238837/apply-timeseries-decomposition-and-anomaly-detection-over-a-sliding-tiled-wind

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!