I have a column with value as
\"RED LOBSTER CA04606\" or \"Red Lobster NewYork WY245\" n so on
How can I extract just the name Red Lobster or Red Lobster N
Using a combination of strsplit
and grepl
sapply(strsplit(x, ' '), function(x) paste(x[!grepl('[[:digit:]]',x)], collapse = ' '))
This splits by space, then tests whether there are digits splitted vector, and only pastes together those without numbers.
Alternative gsub
version:
x <- c("RED LOBSTER CA04606","Red Lobster NewYork WY245")
gsub("(.+)\\s+(.+$)","\\1",x)
[1] "RED LOBSTER" "Red Lobster NewYork"
and to get the other part of the text:
gsub("(.+)\\s+(.+$)","\\2",x)
[1] "CA04606" "WY245"
Try gsub
> x <- "RED LOBSTER CA04606"
> gsub("\\S*\\d+\\S*",'', x)
[1] "RED LOBSTER "
> x<-"Red Lobster NewYork WY245"
> gsub("\\S*\\d+\\S*",'', x)
[1] "Red Lobster NewYork "
> x<-"Red Lobster NewYork WY245 BLUE LOBSTER CA04606"
> gsub("\\S*\\d+\\S*",'', x)
[1] "Red Lobster NewYork BLUE LOBSTER "
this is step by step
mystr<-"Red Lobster NewYork WY245"
r<-regexpr("[A-Z][A-Z][0-9]+", mystr)
s<-substr(mystr, r[1], r[1] + attr(r, "match.length"))
mystr<-sub(s, "", mystr)
Since you're trying to use stringr
, I recommend str_extract
(I'd recommend it even if you weren't trying to use stringr
):
x <- c('RED LOBTSER CA04606', 'Red Lobster NewYork WY245')
str_extract(x, '[a-zA-Z ]+\\b')
# [1] "RED LOBSTER " "Red Lobster NewYork "
The '\b' in the regex prevents the 'CA' from 'CA04606' being extracted.
If you don't like that trailing space you could use str_trim
to remove it, or you could modify the regex:
str_extract(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b')
# [1] "RED LOBSTER" "Red Lobster NewYork"
Note - if your string has non-numbers after the post code, the above only returns the words before. So in the example below, if you wanted to get the 'NewYork' after the 'WY245', you can use str_extract_all
and paste the results together:
x <- c(x, 'Red Lobster WY245 NewYork')
str_extract_all(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b')
# [[1]]
# [1] "RED LOBSTER"
#
# [[2]]
# [1] "Red Lobster NewYork"
#
# [[3]]
# [1] "Red Lobster" "NewYork"
# Paste the bits together with paste(..., collapse=' ')
sapply(str_extract_all(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b'), paste, collapse=' ')
# [1] "RED LOBSTER" "Red Lobster NewYork" "Red Lobster NewYork"