Detect changes in the seasonal component using bfast

偶尔善良 提交于 2020-12-06 15:22:30

问题


The bfast() function in package bfast is supposed to be able to detect both breakpoints in long-term trends and changes in the seasonal component. One example is this graph (source):

In this graph, subplot no. 2 shows a detected change in seasonality, while no. 3 shows a breakpoint in the trend.

However, I don't understand how to tell bfast() to look for changes/breakpoints in seasonality. All I get is breakpoints in the long-term trend. Here is a reproducible example, simulating a 50-year time series with weekly measurements of the seasonal variable y (i.e., 52 measurements per year):

n_years <- 50
freq <- 52
y_pattern <- sin(seq(0, 2*pi, length = freq))
y <- rep(y_pattern, n_years) + rnorm(freq*n_years, sd = 0.1)
mydata <- data.frame(Year = rep(1:n_years, each = freq), Week = rep(1:freq, n_years), y)  

These data display a constant seasonal trend in the data, with an annual peak around week 13. Now, let us introduce a shift in seasonality in year 25, shifting the seasonal cycle 8 weeks later for the years 26-59:

move_data <- function(data, year, weeks_to_move){
  x <- data[data$Year == year, "y"]
  c(x[seq(52 - weeks_to_move + 1,52)], x[seq(1, 52 - weeks_to_move)])
}

mydata$y_shifted <- mydata$y
for (year in 26:50){
  mydata$y_shifted[mydata$Year == year] <- move_data(mydata, year, weeks_to_move = 8)
}

The variable y_shifted now has the annual peak around week 13 in years 1-25 and and around week 21 in years 26-52. Let us plot it, compared to the 'unshifted' variable y:

mydata$Phase <- ifelse(mydata$Year <= 25, "Year 1-25", "Year 26-50")
mydata %>%
  tidyr::gather("y_variable", "value", y, y_shifted) %>%
  ggplot(aes(Week, value, group = Year, color = Phase)) + geom_line() +
  facet_grid(.~y_variable)

[Annual cycle of ]y and y_shifted[3]

This abrupt shift in seasonality should be easy to detect. However, when I run `bfast(), it doesn't detect any change:

y_ts <- ts(mydata$y_shifted, start = c(1,1), frequency = freq)
fit <- bfast(y_ts, h=.15, season="harmonic", max.iter=20, breaks=3)
plot(fit)

As you can see, no change is detected in the seasonality (subplot 2 above). The residuals (subplot 4) picks up the change in seasonality, which is clear if we plot residuals by day-of-the-year:

mydata$Residuals <- fit$output[[1]]$Nt
ggplot(mydata, aes(Week, Residuals, group = Year, color = Phase)) + geom_point()

I have a feeling that there is some parameter or option I need to change in order to make bfast() look for changes in seasonality, but which? I haven't been able to dig out this info from the documentation.


回答1:


I got the same problem when testing bfast on my consumer portfolio data and failed to find any real solution. I went on to dig into the bfast literature from the earth sensing community, which is where bfast was first developed and extensively used. My read is that there was really little you can do to get bfast to always fit a useful seasonal component.

Days ago, I encountered this Quora discussion on the best software for time series analysis and found there is a new R package Rbeast for breakpoint detection and time series decomposition. There is also a nice tweet that shows a quick comparison between bfast and Rbeast.

After some experimenting, I found Rbeast was able to pinpoint seasonal breakpoints in my data as well as yours. Frankly speaking, I have no idea how Rbeast works. The BEAST algorithm in Rbeast seems rather complicated, with tons of outputs; it is not well documented and not as easy to use as bfast. Let me show what I got, first using your data and then using a second artificial time series.

Your data

# The original code to generate your data
n_years <- 50
freq    <- 52
y_pattern <- sin(seq(0, 2*pi, length = freq))
y         <- rep(y_pattern, n_years) + rnorm(freq*n_years, sd = 0.1)
mydata    <- data.frame(Year = rep(1:n_years, each = freq), Week = rep(1:freq, n_years), y) 

move_data <- function(data, year, weeks_to_move){
  x <- data[data$Year == year, "y"]
  c(x[seq(52 - weeks_to_move + 1,52)], x[seq(1, 52 - weeks_to_move)])
}

mydata$y_shifted <- mydata$y
for (year in 26:50){
  mydata$y_shifted[mydata$Year == year] <- move_data(mydata, year, weeks_to_move = 8)
}

# You data analyzed by the BEAST algorithm in Rbeast
library(Rbeast) 
# Rbeast's input should be a data vector not a ts object.
# '52' is the frequency (called period in BEAST)
fit <- beast(mydata$y_shifted,52)
plot(fit)

# another way to run BEAST by customizing the parameters explicitly
opt$period=52         # ts frequency/period
opt$minSeasonOrder=1  # min harmonic order used to fit seasonal cmpnt
opt$maxSeasonOrder=5  # max harmonic order used to fit seasonal cmpnt
fit <- beast(mydata$y_shifted,opt)
plot(fit)

The abrupt seasonal shift was detected precisely. Rbeast also gives the probability of detecting breakpoints in seasonality and trend (the black curves in the subplots of the above figure). The probability for the detected seasonal shift is very high, more than 0.91. Your data has a constant trend (i.e., no trend). Rbeast found no breakpoint in the trend, but the fitted trend appears nonlinear (the gray envelope is the confidence interval). I guess the non-linearity is because BEAST averages many individual trends together. The probability of detecting breaks in the trend also looks very weird in the above plot. That turns out to be an illusion due to the y scaling.

I re-plotted the trend result on a normal y scale.

par( mar = c(3, 5, 3, 2), mfrow=c(2,1) )
plot( fit$t,  main='trend',  type='l', ylim=c(-0.1,0.1) )
plot( fit$tProb,main='changepoint probability in trend',type='l', ylim=c(0,1) )

The figure below shows that the trend is essentially a constant of zero and the probability of finding breakpoints (i.e., changepoints as used in Rbeast) in trend is also close to zero all over.



A second time series

A cool feature of Rbeast is the estimation of sin/cos orders for the harmonic seasonal model. Below I generated a time series that has three segments in the seasonal component (i.e., two breaks) plus a sloped trend with no breaks. The three seasonal segments have different sin orders, taking 1, 2, and 3, respectively.

# Generate a sample time series with three seasonal segments
# the sin/cos orders for the three segs are different.
seg1 <- 1:1000
seg2 <- 1001:2000
seg3 <- 2001:3000
new_data <- c( sin(seg1*2*pi/52), 0.6*sin( seg2*2*pi/52*2), 0.3*sin( seg3*2*pi/52*3)) + (1:3000)*0.0002+ rnorm(3000, sd = 0.1)
# Test bfast using new_data
y_ts <- ts(new_data, start = c(1,1), frequency = 52)
fit  <- bfast(y_ts, h=.15, season="harmonic", max.iter=20, breaks=3)
plot(fit)

Surprisingly enough, bfast didn't detect any breaks in seasonality, although the three segments are easily eye-balled in the plotted data Yt.

# Analyze the new_data time series using Rbeast

opt=list()
opt$period=52
opt$minSeasonOrder=1
opt$maxSeasonOrder=4
opt$samples = 6000  
opt$computeHarmonicOrder = 1 # "1" asks BEAST to output seasonal order
fit <- beast(new_data ,opt)
plot(fit)

The above is the Rbeast result. The two breaks and the three seasonal segments are recovered. Again, there are no breaks in the trend. The breakpoint probability for the trend shows there may be some breakpoints, but the probability curve is actually close to zero and not plotted in a normal y range.

Setting "opt$computeHarmonicOrder " tells beast to save estimated seasonal harmonic orders to fit$horder. The figure below is the output. The three sin orders are also recovered well. This curve also shows the locations of the two seasonal breaks.

plot(fit$horder,type='l')



来源:https://stackoverflow.com/questions/52708697/detect-changes-in-the-seasonal-component-using-bfast

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!