问题
I have a data frame with values associated to a year and month. I use yearmon
class from zoo
package to store the year-month info.
My aim is to count the average of those values from the same year-month. However, using dplyr
seems to give me an error.
The variable tst
below for reproduction
> str(tst)
'data.frame': 20 obs. of 2 variables:
$ n : int 23 24 26 27 26 23 19 19 22 22 ...
$ ym:Class 'yearmon' num [1:20] 2004 2004 2004 2004 2004 ...
> dput(tst)
structure(list(n = c(23L, 24L, 26L, 27L, 26L, 23L, 19L, 19L,
22L, 22L, 22L, 22L, 26L, 26L, 19L, 22L, 26L, 25L, 22L, 18L),
ym = structure(c(2004, 2004, 2004, 2004, 2004.08333333333,
2004.08333333333, 2004.08333333333, 2004.08333333333, 2004.08333333333,
2004.16666666667, 2004.16666666667, 2004.16666666667, 2004.16666666667,
2004.25, 2004.25, 2004.25, 2004.25, 2004.33333333333, 2004.33333333333,
2004.33333333333), class = "yearmon")), .Names = c("n", "ym"
), row.names = c(NA, 20L), class = "data.frame")
And the error was
> tst %>% group_by(ym) %>% summarize(ave=mean(n))
Error: column 'ym' has unsupported type : yearmon
Is there a way to make it work with both zoo
and dplyr
, or I'll have to encode my year-month separately?
回答1:
As the error says, the class is not supported in dplyr
. We can change the ym
to to a class that dplyr
supports and it will work
library(dplyr)
tst %>%
group_by(ym = as.numeric(ym)) %>%
summarise(ave = mean(n))
# ym ave
#1 2004.000 25.00000
#2 2004.083 21.80000
#3 2004.167 23.00000
#4 2004.250 23.25000
#5 2004.333 21.66667
Or as @G.Grothendieck mentioned in the comments, we can replace the group_by
by group_by(ym = as.Date(ym)
or group_by(ym = format(ym, "%Y-%m"))
回答2:
Maybe you asked this question when dplyr 0.4.3 wasn't yet released, by I found that upgrading to this version got rid of the error.
(A colleague was using dplyr 0.4.2, which also worked :)
来源:https://stackoverflow.com/questions/30553761/using-dplyr-summary-function-on-yearmon-from-zoo