Combine observations based on the variable ID if at least 5 IDs are combined

假装没事ソ 提交于 2019-12-12 04:28:29

问题


Last week I posted the following question . The idea was to make a loop that determined the content of a database by randomly combining observations based on the variable "id".

For instance:

  • dataset 1: combinations of id 1, 2, 3, 4, 5, 6, 7, 8...
  • dataset 2: combinations of id 1, 2, 3
  • dataset 3: combinations of id 2, 3, 4, 5
  • dataset 4: combinations of id 5, 6, 7, 8, 9, 10...

I got a perfect answer to the question:

for(i in 2:max(o$id)){
  combis=combn(unique(o$id),i)
  for(j in 1:ncol(combis)){
    sub=o[o$id %in% combis[,j],]
    out=sub[1,]    # use your function
    out$label=paste(combis[,j],collapse ='') #provide an id so you know for which combination this result is
    result=rbind(result,out) # paste it to previous output
  }
}

However, my question now is the following: is there a way to specify that I only want combinations of at least 5 ids combined? The process takes up a lot of computing time and I noticed that small datasets (with les than 5 different ids) give biased results.

Through this link, a sample of the dataset and the full code can be found to reproduce the example. Please be aware that it can take a while to run the entire code, unless there is something specified that I am only interested in combinations of at least 5 ids.


回答1:


You can start the loop at 5:

for(i in 5:max(o$id)){
  combis=combn(unique(o$id),i)
   ...

This way, there are at least 5 elements in each combination (see ?combn).



来源:https://stackoverflow.com/questions/40636032/combine-observations-based-on-the-variable-id-if-at-least-5-ids-are-combined

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!