Concatenate without duplicates dataframe r

随声附和 提交于 2020-01-04 02:42:08

问题


I have a dataframe where I would like to concatenate certain columns.

My issue is that the text in these columns may or may not contain duplicate information. I would like to strip out the duplicates in order to retain only the relevant information.

For example, if I had a data frame such as:

  Animal1         Animal2        Label  
1 cat dog         dolphin        19
2 dog cat         cat            72
3 pilchard 26     koala          26
4 newt bat 81     bat            81

You can see that in row 2, 'cat' is contained in both columns 'Animal1' and 'Animal2'. In row 3, the number 26 is in both column 'Animal1' and 'Label'. Whereas in row 4, information that is in columns 'Animal2' and 'Label' are already contained in order in 'Animal1'.

So by using the paste function I can concatenate the columns...

data1 <- paste(data$Animal1, data$Animal2, data$Label, sep = " ")

However, I haven't managed yet to remove duplicates. The output I'm getting is of course just from my concatenation:

  Output1
1 cat dog dolphin 19
2 dog cat cat 72
3 pilchard 26 koala 26
4 newt bat 81 bat 81

Row 1 is fine, but the other rows contain duplicates as described above.

The output I would desire is:

  Output1
1 cat dog dolphin 19
2 dog cat 72
3 pilchard koala 26
4 newt bat 81

I tried removing duplicates after concatenating. I know that within a string you can do something like the example below (e.g. Removing duplicate words in a string in R).

d <- unlist(strsplit(data1, split=" "))
paste(d[-which(duplicated(d))], collapse = ' ')

This did work for me when I was just using a string but I couldn't apply it to the whole column as I received an error 'unexpected symbol' referring to the square brackets.

I have seen that there is also the unique() function e.g. Remove Duplicated String in a Row, Deleting reversed duplicates with R

reduce_row = function(i) {
  split = strsplit(i, split=", ")[[1]]
  paste(unique(split), collapse = ", ") 
}
data1$v2 = apply(data1, 1, reduce_row)

I tried to use these examples, but as yet have not been successful.

Any assistance would be very much appreciated.


回答1:


After you've done data1 <- paste(data$Animal1, data$Animal2, data$Label, sep = " ") :

data.frame(Output1 = vapply(strsplit(data1, " +"), function(x) paste(unique(x), collapse = " "), character(1)))
#              Output1
# 1 cat dog dolphin 19
# 2         dog cat 72
# 3  pilchard 26 koala
# 4        newt bat 81


来源:https://stackoverflow.com/questions/42817447/concatenate-without-duplicates-dataframe-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!