How do keep only unique words within each string in a vector

后端 未结 1 718
陌清茗
陌清茗 2020-12-03 09:03

I have data that looks like this:

vector = c(\"hello I like to code hello\",\"Coding is fun\", \"fun fun fun\")

I want to remove duplicate

1条回答
  •  抹茶落季
    2020-12-03 09:44

    Split it up (strsplit on spaces), use unique (in lapply), and paste it back together:

    vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
    # [1] "hello i like to code" "coding is fun"        "fun"  
    
    ## OR
    vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))
    

    Update based on comments

    You can always write a custom function to use with your vapply function. For instance, here's a function that takes a split string, drops strings that are shorter than a certain number of characters, and has the "unique" setting as a user choice.

    myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
      a <- if (isTRUE(onlyUnique)) unique(x) else x
      paste(a[nchar(a) > minLen], collapse = " ")
    }
    

    Compare the output of the following to see how it would work.

    vapply(strsplit(vector, " "), myFun, character(1L))
    vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
    vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)
    

    0 讨论(0)
提交回复
热议问题