Cumulative number of unique values in a column up to current row

十年热恋 提交于 2019-12-29 01:44:05

问题


I have a data frame, donorInfo, with donor information:

id        giftdate     giftamt
002       2001-01-05     25.00
033       2001-05-08     50.00
054       2001-09-22    125.00
125       2001-11-05     40.00
042       2001-12-04     75.00
...           ...         ...

I'd like to create a column that shows the cumulative number of unique donor id's up to that date. I think it's something like:

donorInfo$numUnique <- apply/lapply (donorInfo, 1, FUN=nrow(unique(donorInfo$id)))

unfortunately this isn't working and I'm wondering how to remedy things. Thanks for any suggestions.


回答1:


You can do this with duplicated() and cumsum() (taking advantage of the fact that Boolean-valued logical vectors can be coerced to numeric vectors):

# Example data.frame with some duplicated ids
df <- read.table(text="
id   giftdate giftamt
 2 2001-01-05      25
33 2001-05-08      50
 2 2001-09-22     125
33 2001-11-05      40
42 2001-12-04      75", header=T)

cumsum(!duplicated(df$id))
# [1] 1 2 2 2 3



回答2:


try something like this:

donorInfo$numUnique<-sapply(seq(nrow(donorInfo)), function(rn){
  length(unique(donorInfo$id[seq(rn)]))
})

Not the most efficient solution no doubt, but it should work.



来源:https://stackoverflow.com/questions/8450981/cumulative-number-of-unique-values-in-a-column-up-to-current-row

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!