Cumulative count of values in R

感情迁移 提交于 2019-12-07 09:44:21

问题


I hope you are doing very well. I would like to know how to calculate the cumulative sum of a data set with certain conditions. A simplified version of my data set would look like:

t   id  
A   22
A   22
R   22
A   41
A   98
A   98
A   98
R   98
A   46
A   46
R   46
A   46
A   46
A   46
R   46
A   46
A   12
R   54
A   66
R   13 
A   13
A   13
A   13
A   13
R   13
A   13

Would like to make a new data set where, for each value of "id", I would have the cumulative number of times that each id appears , but when t=R I need to restart the counting e.g.

t   id  count
A   22  1
A   22  2
R   22  0
A   41  1
A   98  1
A   98  2
A   98  3
R   98  0
A   46  1
A   46  2
R   46  0
A   46  1
A   46  2
A   46  3
R   46  0
A   46  1
A   12  1
R   54  0
A   66  1
R   13  0
A   13  1
A   13  2
A   13  3
A   13  4
R   13  0
A   13  1

Any ideas as to how to do this? Thanks in advance.


回答1:


Using rle:

out <- transform(df, count = sequence(rle(do.call(paste, df))$lengths))
out$count[out$t == "R"] <- 0

If your data.frame has more than these two columns, and you want to check only these two columns, then, just replace df with df[, 1:2] (or) df[, c("t", "id")].

If you find do.call(paste, df) dangerous (as @flodel comments), then you can replace that with:

as.character(interaction(df))

I personally don't find anything dangerous or clumsy with this setup (as long as you have the right separator, meaning you know your data well). However, if you do find it as such, the second solution may help you.


Update:

For those who don't like using do.call(paste, df) or as.character(interaction(df)) (please see the comment exchanges between me, @flodel and @HongOoi), here's another base solution:

idx <- which(df$t == "R")
ww <- NULL
if (length(idx) > 0) {
    ww <- c(min(idx), diff(idx), nrow(df)-max(idx))
    df <- transform(df, count = ave(id, rep(seq_along(ww), ww), 
                   FUN=function(y) sequence(rle(y)$lengths)))
    df$count[idx] <- 0
} else {
    df$count <- seq_len(nrow(df))
}


来源:https://stackoverflow.com/questions/17245349/cumulative-count-of-values-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!