How to merge multiple rows by a given condition and sum?

為{幸葍}努か 提交于 2021-02-17 05:50:06

问题


I have long format data with ID, time and state columns. I would like some states to be merged within ID by s_2 and s_3 and the time column to be summed. Let's say I have data:

ID state time
1 s_1 4
1 s_2 6
1 s_3 7
2 s_1 2
2 s_2 12
2 s_3 5
2 s_4 4
3 s_1 10
3 s_2 2
3 s_3 3

that I'd like to convert into:

ID state time
1 s_1 4
1 s_2+ 13
2 s_1 2
2 s_2+ 17
2 s_4 4
3 s_1 10
3 s_2+ 5

Any ideas?


回答1:


Change the label of state values and then group by sum.

library(dplyr)

df %>%
  group_by(ID, state = replace(state, state %in% c('s_2', 's_3'), 's_2+')) %>%
  summarise(time = sum(time))

#    ID state  time
#  <int> <chr> <int>
#1     1 s_1       4
#2     1 s_2+     13
#3     2 s_1       2
#4     2 s_2+     17
#5     2 s_4       4
#6     3 s_1      10
#7     3 s_2+      5

Or in base R :

aggregate(time~ID + state, transform(df,
          state = replace(state, state %in% c('s_2', 's_3'), 's_2+')), sum)

If there are many such groups that you want to collapse perhaps forcats::fct_collapse would be helpful.

df %>%
  group_by(ID, state = forcats::fct_collapse(state, `s2+` = c('s_2', 's_3'))) %>%
  summarise(time = sum(time))



回答2:


Try this:

library(data.table)
df <- setDT(df)

df <- df[,`:=`(
  state = case_when(
    state %in% c('s_1', 's_2') ~ "s_2+",
    TRUE ~ state
  ))
]

df <- df[, .(time = sum(time)), by = .(id, state)]


来源:https://stackoverflow.com/questions/64167983/how-to-merge-multiple-rows-by-a-given-condition-and-sum

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!