问题
I have a data.frame in R where one column is a list of dates (many of which are duplicates), whereas the other column is a temperature recorded on that date. The columns in question look like this (but is several thousand rows and a few other unnecessary cols):
Date | Temp
-----------------
1/2/13 34.4
1/2/13 36.4
1/2/13 34.3
1/4/13 45.6
1/4/13 33.5
1/5/13 45.2
I need to find a way of getting a daily average for temperature. So ideally, I could tell R to loop through the data.frame and for every date that matched, give me an average for the temperature that day. I've been googling and I know loops in R are possible, but I can't wrap my head around this conceptually given what little I know about R code.
I know I can pull out a single column and average it (i.e. mean(data.frame[[2]])
) but I'm utterly lost on how to tell R to match that mean to a single value located in the first column.
Additionally, how could I generate an average for every seven calendar days (regardless of how many entries exist for a single day)? So, a seven day rolling average, i.e. if my date range starts at 1/1/13 I'd get an average for all temps taken between 1/1/13 and 1/7/13, and then between 1/8/13 and 1/15/13 and so on...
Any assistance helping me grasp R loops is much appreciated. Thank you!
EDIT
Here's the output of dput(head(my.dataframe))
PLEASE NOTE: I edited down both "date" and "timestamp" because they both go on for several thousand entries otherwise:
structure(list(RECID = 579:584, SITEID = c(101L, 101L, 101L,
101L, 101L, 101L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L), DAY = c(7L,
7L, 7L, 7L, 7L, 7L), DATE = structure(c(34L, 34L, 34L, 34L, 34L,
34L), .Label = c("10/1/2013", "10/10/2013", "10/11/2013", "10/12/2013",
"10/2/2013", "10/3/2013", "10/4/2013", "10/5/2013", "10/6/2013",
"10/7/2013", "10/8/2013", "10/9/2013", "6/10/2013", "6/11/2013","9/9/2013"), class = "factor"), TIMESTAMP = structure(784:789, .Label = c("10/1/2013 0:00",
"10/1/2013 1:00", "10/1/2013 10:00", "10/1/2013 11:00", "10/1/2013 12:00",
"10/1/2013 13:00", "10/1/2013 14:00", "10/1/2013 15:00", "10/1/2013 16:00",
"10/1/2013 17:00", "10/1/2013 18:00", "10/1/2013 19:00", "10/1/2013 2:00"), class = "factor"), TEMP = c(23.376, 23.376, 23.833, 24.146,
24.219, 24.05), X.C = c(NA, NA, NA, NA, NA, NA)), .Names = c("RECID",
"SITEID", "MONTH", "DAY", "DATE", "TIMESTAMP", "TEMP", "X.C"), row.names = c(NA,
6L), class = "data.frame")
回答1:
library(plyr)
ddply(df, .(Date), summarize, daily_mean_Temp = mean(Temp))
This is a simple example of the Split-Apply-Combine paradigm.
Alternative #1 as Ananda Mahto mentions, dplyr
package is a higher-performance rewrite of plyr
. He shows the syntax.
Alternative #2: aggregate()
is also functionally equivalent, just has fewer bells-and-whistles than plyr/dplyr
.
Additionally 'generate average for every 7 calendar days': do you mean 'average-by-week-of-year', or 'moving 7-day average (trailing/leading/centered)'?
回答2:
Here are a few options:
aggregate(Temp ~ Date, mydf, mean)
# Date Temp
# 1 1/2/13 35.03333
# 2 1/4/13 39.55000
# 3 1/5/13 45.20000
library(dplyr)
mydf %.% group_by(Date) %.% summarise(mean(Temp))
# Source: local data frame [3 x 2]
#
# Date mean(Temp)
# 1 1/2/13 35.03333
# 2 1/4/13 39.55000
# 3 1/5/13 45.20000
library(data.table)
DT <- data.table(mydf)
DT[, mean(Temp), by = Date]
# Date V1
# 1: 1/2/13 35.03333
# 2: 1/4/13 39.55000
# 3: 1/5/13 45.20000
library(xts)
dfX <- xts(mydf$Temp, as.Date(mydf$Date))
apply.daily(dfX, mean)
# [,1]
# 1-02-13 35.03333
# 1-04-13 39.55000
# 1-05-13 45.20000
Since you are dealing with dates, you should explore the xts
package, which will give you access to functions like apply.daily
, apply.weekly
, apply.monthly
and so on which will let you conveniently aggregate your data.
来源:https://stackoverflow.com/questions/23179336/compute-data-frame-column-averages-by-date