R, dplyr: cumulative version of n_distinct

后端 未结 4 1514

I have a dataframe as follows. It is ordered by column time.

Input -

df = data.frame(time = 1:20,
            grp = sort(rep(1:5,4)),
             


        
4条回答
  •  南旧
    南旧 (楼主)
    2021-02-09 03:56

    Try:

    Update

    With your new dataset, an approach in base R

      df$var2 <-  unlist(lapply(split(df, df$grp),
                  function(x) {x$var2 <-0
                   indx <- match(unique(x$var1), x$var1)
                   x$var2[indx] <- 1
                   cumsum(x$var2) }))
    
      head(df,7)
      #   time grp var1 var2
      # 1    1   1    A    1
      # 2    2   1    B    2
      # 3    3   1    A    2
      # 4    4   1    B    2
      # 5    5   2    A    1
      # 6    6   2    B    2
      # 7    7   2    A    2
    

提交回复
热议问题