Is there a dplyr equivalent to data.table::rleid?

那年仲夏 提交于 2019-11-26 00:20:51

问题


data.table offers a nice convenience function, rleid for run-length encoding:

library(data.table)
DT = data.table(grp=rep(c(\"A\", \"B\", \"C\", \"A\", \"B\"), c(2, 2, 3, 1, 2)), value=1:10)
rleid(DT$grp)
# [1] 1 1 2 2 3 3 3 4 5 5

I can mimic this in base R with:

df <- data.frame(DT)
rep(seq_along(rle(df$grp)$values), times = rle(df$grp)$lengths)
# [1] 1 1 2 2 3 3 3 4 5 5

Does anyone know of a dplyr equivalent (?) or is the \"best\" way to create the rleid behavior with dplyr is to do something like the following

library(dplyr)

my_rleid = rep(seq_along(rle(df$grp)$values), times = rle(df$grp)$lengths)

df %>%
  mutate(rleid = my_rleid)

回答1:


You can just do (when you have both data.table and dplyr loaded):

DT <- DT %>% mutate(rlid = rleid(grp))

this gives:

> DT
    grp value rlid
 1:   A     1    1
 2:   A     2    1
 3:   B     3    2
 4:   B     4    2
 5:   C     5    3
 6:   C     6    3
 7:   C     7    3
 8:   A     8    4
 9:   B     9    5
10:   B    10    5

When you don't want to load data.table separately you can also use (as mentioned by @DavidArenburg in the comments):

DT <- DT %>% mutate(rlid = data.table::rleid(grp))

And as @RichardScriven said in his comment you can just copy/steal it:

myrleid <- data.table::rleid



回答2:


If you want to use just base R and dplyr, the better way is to wrap up your own one or two line version of rleid() as a function and then apply that whenever you need it.

library(dplyr)

myrleid <- function(x) {
    x <- rle(x)$lengths
    rep(seq_along(x), times=x)
}

## Try it out
DT <- DT %>% mutate(rlid = myrleid(grp))
DT
#   grp value rlid
# 1:   A     1    1
# 2:   A     2    1
# 3:   B     3    2
# 4:   B     4    2
# 5:   C     5    3
# 6:   C     6    3
# 7:   C     7    3
# 8:   A     8    4
# 9:   B     9    5
#10:   B    10    5



回答3:


You can do it using the lag function from dplyr.

DT <-
    DT %>%
    mutate(rleid = (grp != lag(grp, 1, default = "asdf"))) %>%
    mutate(rleid = cumsum(rleid))

gives

> DT
    grp value rleid
 1:   A     1     1
 2:   A     2     1
 3:   B     3     2
 4:   B     4     2
 5:   C     5     3
 6:   C     6     3
 7:   C     7     3
 8:   A     8     4
 9:   B     9     5
10:   B    10     5



回答4:


A simplification (involving no additional package) of the approach used by the OP could be:

DT %>%
 mutate(rleid = with(rle(grp), rep(seq_along(lengths), lengths)))

   grp value rleid
1    A     1     1
2    A     2     1
3    B     3     2
4    B     4     2
5    C     5     3
6    C     6     3
7    C     7     3
8    A     8     4
9    B     9     5
10   B    10     5

Or:

DT %>%
 mutate(rleid = rep(seq(ls <- rle(grp)$lengths), ls))


来源:https://stackoverflow.com/questions/33507868/is-there-a-dplyr-equivalent-to-data-tablerleid

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!