问题
I have long format data with ID, time and state columns. I would like some states to be merged within ID by s_2 and s_3 and the time column to be summed. Let's say I have data:
ID state time
1 s_1 4
1 s_2 6
1 s_3 7
2 s_1 2
2 s_2 12
2 s_3 5
2 s_4 4
3 s_1 10
3 s_2 2
3 s_3 3
that I'd like to convert into:
ID state time
1 s_1 4
1 s_2+ 13
2 s_1 2
2 s_2+ 17
2 s_4 4
3 s_1 10
3 s_2+ 5
Any ideas?
回答1:
Change the label of state values and then group by sum.
library(dplyr)
df %>%
group_by(ID, state = replace(state, state %in% c('s_2', 's_3'), 's_2+')) %>%
summarise(time = sum(time))
# ID state time
# <int> <chr> <int>
#1 1 s_1 4
#2 1 s_2+ 13
#3 2 s_1 2
#4 2 s_2+ 17
#5 2 s_4 4
#6 3 s_1 10
#7 3 s_2+ 5
Or in base R :
aggregate(time~ID + state, transform(df,
state = replace(state, state %in% c('s_2', 's_3'), 's_2+')), sum)
If there are many such groups that you want to collapse perhaps forcats::fct_collapse would be helpful.
df %>%
group_by(ID, state = forcats::fct_collapse(state, `s2+` = c('s_2', 's_3'))) %>%
summarise(time = sum(time))
回答2:
Try this:
library(data.table)
df <- setDT(df)
df <- df[,`:=`(
state = case_when(
state %in% c('s_1', 's_2') ~ "s_2+",
TRUE ~ state
))
]
df <- df[, .(time = sum(time)), by = .(id, state)]
来源:https://stackoverflow.com/questions/64167983/how-to-merge-multiple-rows-by-a-given-condition-and-sum