R: Function “diff” over various groups

旧时模样 提交于 2019-12-12 02:06:25

问题


While searching for a solution to my problem I found this thread: Function "diff" over various groups in R. I've got a very similar question so I'll just work with the example there.

This is what my desired output should look like:

name class year diff
1    a    c1   2009  NA      
2    a    c1   2010   67
3    b    c1   2009  NA
4    b    c1   2010   20

I have two variables which form subgroups - class and name. So I want to compare only the values which have the same name and class. I also want to have the differences from 2009 to 2010. If there is no 2008, diff 2009 should return NA (since it can't calculate a difference).

I'm sure it works very similarly to the other thread but I just can't make it work. I used this code too (and simply solved the ascending year by sorting the data differently), but somehow R still manages to calculate a difference and does not return NA.

ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))

回答1:


Using dplyr

  df %>% 
  filter(year!=2008)%>% 
  arrange(name, class, year)%>%
  group_by(class, name)%>%
  mutate(diff=c(NA,diff(value)))
  # Source: local data frame [12 x 5]
  #  Groups: class, name

  #     name class year value diff
  #  1     a    c1 2009    33   NA
  #  2     a    c1 2010   100   67
  #  3     a    c2 2009    80   NA
  #  4     a    c2 2010    90   10
  #  5     a    c3 2009    90   NA
  #  6     a    c3 2010   100   10
  #  7     b    c1 2009    80   NA
  #  8     b    c1 2010    90   10
  #  9     b    c2 2009    90   NA
  #  10    b    c2 2010   100   10
  #  11    b    c3 2009    80   NA
  #  12    b    c3 2010    99   19

Update:

With relative difference
 df %>%
 filter(year!=2008)%>% 
 arrange(name, class, year)%>%
 group_by(class, name)%>%
 mutate(diff1=c(NA,diff(value)), rel_diff=round(diff1/value[row_number()-1],2))



回答2:


Using the data set form the other post, I would do something like

library(data.table)
df <- df[df$year != 2008, ]
setkey(setDT(df), class, name, year)
df[, diff := lapply(.SD, function(x) c(NA, diff(x))), 
              .SDcols = "value", by = list(class, name)]

Which returns

df
#    name class year value diff
# 1:    a    c1 2009    33   NA
# 2:    a    c1 2010   100   67
# 3:    b    c1 2009    80   NA
# 4:    b    c1 2010    90   10
# 5:    a    c2 2009    80   NA
# 6:    a    c2 2010    90   10
# 7:    b    c2 2009    90   NA
# 8:    b    c2 2010   100   10
# 9:    a    c3 2009    90   NA
#10:    a    c3 2010   100   10
#11:    b    c3 2009    80   NA
#12:    b    c3 2010    99   19


来源:https://stackoverflow.com/questions/24569177/r-function-diff-over-various-groups

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!