Extract text name from String

后端 未结 5 571
星月不相逢
星月不相逢 2020-12-18 16:41

I have a column with value as

\"RED LOBSTER CA04606\" or \"Red Lobster NewYork WY245\" n so on

How can I extract just the name Red Lobster or Red Lobster N

相关标签:
5条回答
  • 2020-12-18 17:01

    Using a combination of strsplit and grepl

     sapply(strsplit(x, ' '), function(x) paste(x[!grepl('[[:digit:]]',x)], collapse = ' '))
    

    This splits by space, then tests whether there are digits splitted vector, and only pastes together those without numbers.

    0 讨论(0)
  • 2020-12-18 17:06

    Alternative gsub version:

    x <- c("RED LOBSTER CA04606","Red Lobster NewYork WY245")
    
    gsub("(.+)\\s+(.+$)","\\1",x)
    [1] "RED LOBSTER"         "Red Lobster NewYork"
    

    and to get the other part of the text:

    gsub("(.+)\\s+(.+$)","\\2",x)
    [1] "CA04606" "WY245"  
    
    0 讨论(0)
  • 2020-12-18 17:17

    Try gsub

    > x <- "RED LOBSTER CA04606"
    > gsub("\\S*\\d+\\S*",'', x)
    [1] "RED LOBSTER "
    
    > x<-"Red Lobster NewYork WY245"
    > gsub("\\S*\\d+\\S*",'', x)
    [1] "Red Lobster NewYork "
    
    > x<-"Red Lobster NewYork WY245 BLUE LOBSTER CA04606"
    > gsub("\\S*\\d+\\S*",'', x)
    [1] "Red Lobster NewYork  BLUE LOBSTER "
    
    0 讨论(0)
  • 2020-12-18 17:17

    this is step by step

    mystr<-"Red Lobster NewYork WY245"
    r<-regexpr("[A-Z][A-Z][0-9]+", mystr)
    s<-substr(mystr, r[1], r[1] + attr(r, "match.length"))
    mystr<-sub(s, "", mystr)
    
    0 讨论(0)
  • 2020-12-18 17:21

    Since you're trying to use stringr, I recommend str_extract (I'd recommend it even if you weren't trying to use stringr):

    x <- c('RED LOBTSER CA04606', 'Red Lobster NewYork WY245')
    str_extract(x, '[a-zA-Z ]+\\b')
    # [1] "RED LOBSTER "          "Red Lobster NewYork "
    

    The '\b' in the regex prevents the 'CA' from 'CA04606' being extracted.

    If you don't like that trailing space you could use str_trim to remove it, or you could modify the regex:

    str_extract(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b')
    # [1] "RED LOBSTER"          "Red Lobster NewYork"
    

    Note - if your string has non-numbers after the post code, the above only returns the words before. So in the example below, if you wanted to get the 'NewYork' after the 'WY245', you can use str_extract_all and paste the results together:

    x <- c(x, 'Red Lobster WY245 NewYork')
    str_extract_all(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b')
    # [[1]]
    # [1] "RED LOBSTER"
    # 
    # [[2]]
    # [1] "Red Lobster NewYork"
    # 
    # [[3]]
    # [1] "Red Lobster" "NewYork"    
    
    # Paste the bits together with paste(..., collapse=' ')
    sapply(str_extract_all(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b'), paste, collapse=' ')
    # [1] "RED LOBSTER"          "Red Lobster NewYork" "Red Lobster NewYork"
    
    0 讨论(0)
提交回复
热议问题