R remove non-alphanumeric symbols from a string

前端 未结 2 1279
陌清茗
陌清茗 2020-12-06 05:16

I have a string and I want to remove all non-alphanumeric symbols from and then put into a vector. So this:

\"This is a string.  In addition, this is a stri         


        
相关标签:
2条回答
  • 2020-12-06 05:31

    here is an example:

    > str <- "This is a string. In addition, this is a string!"
    > str
    [1] "This is a string. In addition, this is a string!"
    > strsplit(gsub("[^[:alnum:] ]", "", str), " +")[[1]]
     [1] "This"     "is"       "a"        "string"   "In"       "addition" "this"     "is"       "a"       
    [10] "string"  
    
    0 讨论(0)
  • 2020-12-06 05:50

    Another approach to handle this question

    library(stringr)
    text =  c("This is a string.  In addition, this is a string!")
    str_split(str_squish((str_replace_all(text, regex("\\W+"), " "))), " ")
    #[1] "This"     "is"       "a"        "string"   "In"       "addition" "this"     "is"       "a"        "string"  
    
    • str_replace_all(text, regex("\\W+"), " "): find non-word character and replace " "
    • str_squish(): reduces repeated whitespace inside a string
    • str_split(): split up a string into pieces
    0 讨论(0)
提交回复
热议问题