Averaging daily data into weekly data

我是研究僧i 提交于 2019-11-29 11:05:03

Updated answer

Based on the OP's update on the request, I modified the code to aggregate the data over the date of a defined day of each week (Saturday). This time I only use functions available in base R. It ignores NAs (if there are only NAs for a given End_of_Week-Climate_Division you get NaN, not a number).

# Data with another Climate division as example (same daily values and dates)
CADaily <-
structure(list(Climate_Division = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Date = structure(c(1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L), .Label = c("01/07/1948", "02/07/1948", "03/07/1948", 
"04/07/1948", "05/07/1948", "06/07/1948", "17/07/1948", "18/07/1948", 
"20/07/1948", "22/07/1948"), class = "factor"), Rain = c(0.875, 
2.9166667, 0.7916667, 0.4305556, 0.8262061, 0.5972222, 0.04166667, 
0.08333333, 0.04166667, 0.125, 0.875, 2.9166667, 0.7916667, 0.4305556, 
0.8262061, 0.5972222, 0.04166667, 0.08333333, 0.04166667, 0.125
), week = c(27, 27, 27, 27, 27, 27, 29, 29, 29, 30, 27, 27, 27, 
27, 27, 27, 29, 29, 29, 30)), .Names = c("Climate_Division", 
"Date", "Rain", "week"), row.names = c(NA, 20L), class = "data.frame")

# Coerce to Date class
CADaily$Date <- as.Date(x=CADaily$Date, format='%d/%m/%Y')

# Extract day of the week (Saturday = 6)
CADaily$Week_Day <- as.numeric(format(CADaily$Date, format='%w'))

# Adjust end-of-week date (first saturday from the original Date)
CADaily$End_of_Week <- CADaily$Date + (6 - CADaily$Week_Day)

# Aggregate over week and climate division
aggregate(Rain~End_of_Week+Climate_Division, FUN=mean, data=CADaily, na.rm=TRUE)

# Output
#   End_of_Week Climate_Division       Rain
# 1  1948-07-03                1 1.52777780
# 2  1948-07-10                1 0.61799463
# 3  1948-07-17                1 0.04166667
# 4  1948-07-24                1 0.08333333
# 5  1948-07-03                2 1.52777780
# 6  1948-07-10                2 0.61799463
# 7  1948-07-17                2 0.04166667
# 8  1948-07-24                2 0.08333333

Additional operations

Also, using this code you can obtain results from additional aggregation functions, assuming the result is an atomic vector of the same length for every week-division pair.

# Aggregate over week and climate division, and show the total number of
# observations per week, the number of observations which represent missing
# values, the average, and the standard deviation.
aggregate(Rain~End_of_Week+Climate_Division, data=CADaily,
          FUN=function(x) c(n=length(x),
                            NAs=sum(is.na(x)),
                            Average=mean(x, na.rm=TRUE),
                            SD=sd(x, na.rm=TRUE)))

# Output. You get NA for the standard deviation if there is only one observation.
#   End_of_Week Climate_Division     Rain.n   Rain.NAs Rain.Average    Rain.SD
# 1  1948-07-03                1 3.00000000 0.00000000   1.52777780 1.20353454
# 2  1948-07-10                1 3.00000000 0.00000000   0.61799463 0.19864151
# 3  1948-07-17                1 1.00000000 0.00000000   0.04166667         NA
# 4  1948-07-24                1 3.00000000 0.00000000   0.08333333 0.04166667
# 5  1948-07-03                2 3.00000000 0.00000000   1.52777780 1.20353454
# 6  1948-07-10                2 3.00000000 0.00000000   0.61799463 0.19864151
# 7  1948-07-17                2 1.00000000 0.00000000   0.04166667         NA
# 8  1948-07-24                2 3.00000000 0.00000000   0.08333333 0.04166667



Original answer

Try with the lubridate package. Load it, and then aggregate (kept for the record as part of the original answer, which reflected the OP's request to aggregate by week).

# Load lubridate package
library(package=lubridate)

# Set Weeks number. Date already of class `Date`
CADaily$Week <- week(CADaily$Date)

# Aggregate over week number and climate division
aggregate(Rain~Week+Climate_Division, FUN=mean, data=CADaily, na.rm=TRUE)

# Output
#   Week Climate_Division       Rain
# 1   27                1 1.07288622
# 2   29                1 0.05555556
# 3   30                1 0.12500000
# 4   27                2 1.07288622
# 5   29                2 0.05555556
# 6   30                2 0.12500000

xts is great for such manipulations. Use endpoints to subset data, then sapply to treat it weekly.

CADaily <- read.table(text ='     Climate_Division       Date      Rain
      885                1 1948-07-01 0.8750000
      892                1 1948-07-02 2.9166667
      894                1 1948-07-03 0.7916667
      895                1 1948-07-04 0.4305556
      898                1 1948-07-05 0.8262061
      901                1 1948-07-06 0.5972222
      904                1 1948-07-17 0.04166667
      905                1 1948-07-18 0.08333333
      907                1 1948-07-20 0.04166667
      909                1 1948-07-22 0.12500000',head=T)
dat.xts <- xts(CADaily[,-2], order.by= as.POSIXct(CADaily[,2]))
INDEX <- endpoints(dat.xts, 'weeks')

lapply(1:(length(INDEX) - 1), function(y) {
    y <- dat.xts[(INDEX[y] + 1):INDEX[y + 1]]
    data.frame(y$Climate_Division,mean(y$Rain))

  })

My result is a list by week:

[[1]]
           Climate_Division mean.y.Rain.
1948-07-01                1     1.168019
1948-07-02                1     1.168019
1948-07-03                1     1.168019
1948-07-04                1     1.168019
1948-07-05                1     1.168019

[[2]]
           Climate_Division mean.y.Rain.
1948-07-06                1    0.5972222

[[3]]
           Climate_Division mean.y.Rain.
1948-07-17                1       0.0625
1948-07-18                1       0.0625

[[4]]
           Climate_Division mean.y.Rain.
1948-07-20                1   0.08333334
1948-07-22                1   0.08333334

I backtrack from my previous answer. I think this one is much simpler.

You just need to find what is coming weekend date for each row, and then aggregate

CADaily <- read.table(text = "Climate_Division       Date      Rain\n1 1948-07-01 0.8750000\n1 1948-07-02 2.9166667\n1 1948-07-03 0.7916667\n1 1948-07-04 0.4305556\n1 1948-07-05 0.8262061\n1 1948-07-06 0.5972222\n1 1948-07-17 0.04166667\n1 1948-07-18 0.08333333\n1 1948-07-20 0.04166667\n1 1948-07-22 0.12500000\n2 1948-07-01 0.8750000\n2 1948-07-02 2.9166667\n2 1948-07-03 0.7916667\n2 1948-07-04 0.4305556\n2 1948-07-05 0.8262061\n2 1948-07-06 0.5972222\n2 1948-07-17 0.04166667\n2 1948-07-18 0.08333333\n2 1948-07-20 0.04166667\n2 1948-07-22 0.12500000", 
    head = T)

CADaily$weekend <- as.POSIXlt(CADaily$Date) + (7 - as.POSIXlt(CADaily$Date)$wday) * 24 * 60 * 60

aggregate(Rain ~ weekend + Climate_Division, data = CADaily, FUN = mean)
##      weekend Climate_Division       Rain
## 1 1948-07-04                1 1.52777780
## 2 1948-07-11                1 0.61799463
## 3 1948-07-18                1 0.04166667
## 4 1948-07-25                1 0.08333333
## 5 1948-07-04                2 1.52777780
## 6 1948-07-11                2 0.61799463
## 7 1948-07-18                2 0.04166667
## 8 1948-07-25                2 0.08333333
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!