Cumulative number of unique values in a column up to current row

前端 未结 2 1853
清歌不尽
清歌不尽 2020-12-11 23:45

I have a data frame, donorInfo, with donor information:

id        giftdate     giftamt
002       2001-01-05     25.00
033       2001-05-08     5         


        
相关标签:
2条回答
  • 2020-12-11 23:50

    You can do this with duplicated() and cumsum() (taking advantage of the fact that Boolean-valued logical vectors can be coerced to numeric vectors):

    # Example data.frame with some duplicated ids
    df <- read.table(text="
    id   giftdate giftamt
     2 2001-01-05      25
    33 2001-05-08      50
     2 2001-09-22     125
    33 2001-11-05      40
    42 2001-12-04      75", header=T)
    
    cumsum(!duplicated(df$id))
    # [1] 1 2 2 2 3
    
    0 讨论(0)
  • 2020-12-11 23:50

    try something like this:

    donorInfo$numUnique<-sapply(seq(nrow(donorInfo)), function(rn){
      length(unique(donorInfo$id[seq(rn)]))
    })
    

    Not the most efficient solution no doubt, but it should work.

    0 讨论(0)
提交回复
热议问题