apply-strsplit-rowwise including sort and nested paste

£可爱£侵袭症+ 提交于 2021-01-28 05:36:01

问题


I guess I just don't see it, but all the similar thing I found on the Net, in the Mailinglist archives or the FAQ could not really elucidate my issue.

The closest I have found was this: apply strsplit rowwise

I have a df, with two character columns and one numerical column. Filled like this:

df=data.frame(name1=c("A","B","C","D"),
          name2=c("B","A","D","C"),
          nums=c(1,1,4,4),
          stringsAsFactors=F)

Now I would like to find the unique rows in this, however, only based on the two name columns. And for those columns, the order of the columns has no significance, thus i can not use duplicated, if I understood it correctly.

So I thought about combining the two name columns row wise, make a rowwise sorting, and print out a paste of the vector (length=2 in combination with something like sapply).

However I did not get it to work.

So far, I used a for loop, but this takes ages on the original data.

for(i in 1:length(df$name1)){
           mysort=sort(c(df$name1[i],df$name2[i]))
           df$combname[i]=paste(mysort[1],mysort[2])
    }

Any suggestions are welcome. Maybe I just understand unique and sapply in a wrong way.


回答1:


Solution without for loop.

df$combname <- apply(df[1:2], 1, function(x) paste(sort(x), collapse=""))



回答2:


Perhaps you should explore the "data.table" package. Here's one approach:

library(data.table)
DT <- data.table(df)
DT[, new := paste(sort(c(name1, name2)), collapse = ""), by = 1:nrow(DT)]
DT
#    name1 name2 nums new
# 1:     A     B    1  AB
# 2:     B     A    1  AB
# 3:     C     D    4  CD
# 4:     D     C    4  CD
DT[!duplicated(new), ]
#    name1 name2 nums new
# 1:     A     B    1  AB
# 2:     C     D    4  CD


来源:https://stackoverflow.com/questions/19062699/apply-strsplit-rowwise-including-sort-and-nested-paste

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!