Averaging daily data into weekly data

人盡茶涼 提交于 2019-11-28 04:35:47

问题


I am wondering if there is a way to average daily data into weekly data. The dataframe that I call CADaily looks like this:

      > CADaily[1:10, ]
          Climate_Division       Date      Rain
      885                1 1948-07-01 0.8750000
      892                1 1948-07-02 2.9166667
      894                1 1948-07-03 0.7916667
      895                1 1948-07-04 0.4305556
      898                1 1948-07-05 0.8262061
      901                1 1948-07-06 0.5972222
      904                1 1948-07-17 0.04166667
      905                1 1948-07-18 0.08333333
      907                1 1948-07-20 0.04166667
      909                1 1948-07-22 0.12500000
      910                1 1948-07-21 NA

My objective is similar to the aggregate function to find the average of the daily rain into weekly rain values base on the Date (of course) and the Climate_Division (ranges from 1 to 7). I was searching online and I came across a code that I was able to use but wasn't quite to par to my objective:

      apply.weekly(xts(CADaily[,-2], order.by= CADaily[,2]), FUN = mean)

This does what I would like it to do, however my column Climate_Division is also averaged. I would simply like to average Rain only and order it based on the Climate_Division followed by Date. Is there a way that I can possibly do it as follows:

      aggregate(CADaily, by =list(CADaily$Climate_Division, CADaily$Date), FUN = mean, na.rm = TRUE)

where Date is in some form of weeks? Or is there another way?


EDIT:

Dear All,

Thank you for your help. Perhaps using aggregate wasn't the best way to go about this as I originally thought. In terms of output, I wanted to obtain the weekly average of rain throughout the years of the data (1948 - 1995). In other words, I wanted to get a nice format that I can input into a time series that has the form of the date of the end of the week. The output that I am looking for (keeping in mind that there may exist NA values) is:

      Climate_Division     Date          Rain
      1                    1948-07-03    1.527778
      1                    1948-07-10    0.6179946
      1                    1948-07-17    0.04166667
      1                    1948-07-24    0.08333333
      ...
      1                    1995-12-23    0.24513245
      1                    1995-12-30    0.12450545

Or is there a better way of expressing weekly data that is represented by dates?

Thank you for your help.


回答1:


Updated answer

Based on the OP's update on the request, I modified the code to aggregate the data over the date of a defined day of each week (Saturday). This time I only use functions available in base R. It ignores NAs (if there are only NAs for a given End_of_Week-Climate_Division you get NaN, not a number).

# Data with another Climate division as example (same daily values and dates)
CADaily <-
structure(list(Climate_Division = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Date = structure(c(1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L), .Label = c("01/07/1948", "02/07/1948", "03/07/1948", 
"04/07/1948", "05/07/1948", "06/07/1948", "17/07/1948", "18/07/1948", 
"20/07/1948", "22/07/1948"), class = "factor"), Rain = c(0.875, 
2.9166667, 0.7916667, 0.4305556, 0.8262061, 0.5972222, 0.04166667, 
0.08333333, 0.04166667, 0.125, 0.875, 2.9166667, 0.7916667, 0.4305556, 
0.8262061, 0.5972222, 0.04166667, 0.08333333, 0.04166667, 0.125
), week = c(27, 27, 27, 27, 27, 27, 29, 29, 29, 30, 27, 27, 27, 
27, 27, 27, 29, 29, 29, 30)), .Names = c("Climate_Division", 
"Date", "Rain", "week"), row.names = c(NA, 20L), class = "data.frame")

# Coerce to Date class
CADaily$Date <- as.Date(x=CADaily$Date, format='%d/%m/%Y')

# Extract day of the week (Saturday = 6)
CADaily$Week_Day <- as.numeric(format(CADaily$Date, format='%w'))

# Adjust end-of-week date (first saturday from the original Date)
CADaily$End_of_Week <- CADaily$Date + (6 - CADaily$Week_Day)

# Aggregate over week and climate division
aggregate(Rain~End_of_Week+Climate_Division, FUN=mean, data=CADaily, na.rm=TRUE)

# Output
#   End_of_Week Climate_Division       Rain
# 1  1948-07-03                1 1.52777780
# 2  1948-07-10                1 0.61799463
# 3  1948-07-17                1 0.04166667
# 4  1948-07-24                1 0.08333333
# 5  1948-07-03                2 1.52777780
# 6  1948-07-10                2 0.61799463
# 7  1948-07-17                2 0.04166667
# 8  1948-07-24                2 0.08333333

Additional operations

Also, using this code you can obtain results from additional aggregation functions, assuming the result is an atomic vector of the same length for every week-division pair.

# Aggregate over week and climate division, and show the total number of
# observations per week, the number of observations which represent missing
# values, the average, and the standard deviation.
aggregate(Rain~End_of_Week+Climate_Division, data=CADaily,
          FUN=function(x) c(n=length(x),
                            NAs=sum(is.na(x)),
                            Average=mean(x, na.rm=TRUE),
                            SD=sd(x, na.rm=TRUE)))

# Output. You get NA for the standard deviation if there is only one observation.
#   End_of_Week Climate_Division     Rain.n   Rain.NAs Rain.Average    Rain.SD
# 1  1948-07-03                1 3.00000000 0.00000000   1.52777780 1.20353454
# 2  1948-07-10                1 3.00000000 0.00000000   0.61799463 0.19864151
# 3  1948-07-17                1 1.00000000 0.00000000   0.04166667         NA
# 4  1948-07-24                1 3.00000000 0.00000000   0.08333333 0.04166667
# 5  1948-07-03                2 3.00000000 0.00000000   1.52777780 1.20353454
# 6  1948-07-10                2 3.00000000 0.00000000   0.61799463 0.19864151
# 7  1948-07-17                2 1.00000000 0.00000000   0.04166667         NA
# 8  1948-07-24                2 3.00000000 0.00000000   0.08333333 0.04166667



Original answer

Try with the lubridate package. Load it, and then aggregate (kept for the record as part of the original answer, which reflected the OP's request to aggregate by week).

# Load lubridate package
library(package=lubridate)

# Set Weeks number. Date already of class `Date`
CADaily$Week <- week(CADaily$Date)

# Aggregate over week number and climate division
aggregate(Rain~Week+Climate_Division, FUN=mean, data=CADaily, na.rm=TRUE)

# Output
#   Week Climate_Division       Rain
# 1   27                1 1.07288622
# 2   29                1 0.05555556
# 3   30                1 0.12500000
# 4   27                2 1.07288622
# 5   29                2 0.05555556
# 6   30                2 0.12500000



回答2:


xts is great for such manipulations. Use endpoints to subset data, then sapply to treat it weekly.

CADaily <- read.table(text ='     Climate_Division       Date      Rain
      885                1 1948-07-01 0.8750000
      892                1 1948-07-02 2.9166667
      894                1 1948-07-03 0.7916667
      895                1 1948-07-04 0.4305556
      898                1 1948-07-05 0.8262061
      901                1 1948-07-06 0.5972222
      904                1 1948-07-17 0.04166667
      905                1 1948-07-18 0.08333333
      907                1 1948-07-20 0.04166667
      909                1 1948-07-22 0.12500000',head=T)
dat.xts <- xts(CADaily[,-2], order.by= as.POSIXct(CADaily[,2]))
INDEX <- endpoints(dat.xts, 'weeks')

lapply(1:(length(INDEX) - 1), function(y) {
    y <- dat.xts[(INDEX[y] + 1):INDEX[y + 1]]
    data.frame(y$Climate_Division,mean(y$Rain))

  })

My result is a list by week:

[[1]]
           Climate_Division mean.y.Rain.
1948-07-01                1     1.168019
1948-07-02                1     1.168019
1948-07-03                1     1.168019
1948-07-04                1     1.168019
1948-07-05                1     1.168019

[[2]]
           Climate_Division mean.y.Rain.
1948-07-06                1    0.5972222

[[3]]
           Climate_Division mean.y.Rain.
1948-07-17                1       0.0625
1948-07-18                1       0.0625

[[4]]
           Climate_Division mean.y.Rain.
1948-07-20                1   0.08333334
1948-07-22                1   0.08333334



回答3:


I backtrack from my previous answer. I think this one is much simpler.

You just need to find what is coming weekend date for each row, and then aggregate

CADaily <- read.table(text = "Climate_Division       Date      Rain\n1 1948-07-01 0.8750000\n1 1948-07-02 2.9166667\n1 1948-07-03 0.7916667\n1 1948-07-04 0.4305556\n1 1948-07-05 0.8262061\n1 1948-07-06 0.5972222\n1 1948-07-17 0.04166667\n1 1948-07-18 0.08333333\n1 1948-07-20 0.04166667\n1 1948-07-22 0.12500000\n2 1948-07-01 0.8750000\n2 1948-07-02 2.9166667\n2 1948-07-03 0.7916667\n2 1948-07-04 0.4305556\n2 1948-07-05 0.8262061\n2 1948-07-06 0.5972222\n2 1948-07-17 0.04166667\n2 1948-07-18 0.08333333\n2 1948-07-20 0.04166667\n2 1948-07-22 0.12500000", 
    head = T)

CADaily$weekend <- as.POSIXlt(CADaily$Date) + (7 - as.POSIXlt(CADaily$Date)$wday) * 24 * 60 * 60

aggregate(Rain ~ weekend + Climate_Division, data = CADaily, FUN = mean)
##      weekend Climate_Division       Rain
## 1 1948-07-04                1 1.52777780
## 2 1948-07-11                1 0.61799463
## 3 1948-07-18                1 0.04166667
## 4 1948-07-25                1 0.08333333
## 5 1948-07-04                2 1.52777780
## 6 1948-07-11                2 0.61799463
## 7 1948-07-18                2 0.04166667
## 8 1948-07-25                2 0.08333333


来源:https://stackoverflow.com/questions/15102327/averaging-daily-data-into-weekly-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!