I have a data.table with a row for each day over a 30 year period with a number of different variable columns. The reason for using data.table is that the .csv file I\'m usi
Since you said in your question that you would be open to a completely new solution, you could try the following with dplyr:
df$Date <- as.Date(df$Date, format="%Y-%m-%d")
df$Year.Month <- format(df$Date, '%Y-%m')
df$Month <- format(df$Date, '%m')
require(dplyr)
df %>%
group_by(Key, Year.Month, Month) %>%
summarize(Runoff = sum(Runoff)) %>%
ungroup() %>%
group_by(Key, Month) %>%
summarize(mean(Runoff))
EDIT #1 after comment by @Henrik: The same can be done by:
df %>%
group_by(Key, Month, Year.Month) %>%
summarize(Runoff = sum(Runoff)) %>%
summarize(mean(Runoff))
EDIT #2 to round things up: This is another way of doing it (the second grouping is more explicit this way) thanks to @Henrik for his comments
df %>%
group_by(Key, Month, Year.Month) %>%
summarize(Runoff = sum(Runoff)) %>%
group_by(Key, Month, add = FALSE) %>% #now grouping by Key and Month, but not Year.Month
summarize(mean(Runoff))
It produces the following result:
#Source: local data frame [2 x 3]
#Groups: Key
#
# Key Month mean(Runoff)
#1 A 01 4.366667
#2 B 01 3.266667
You can then reshape the output to match your desired output using e.g. reshape2. Suppose you stored the output of the above operation in a data.frame df2, then you could do:
require(reshape2)
df2 <- dcast(df2, Key ~ Month, sum, value.var = "mean(Runoff)")