Extract text name from String

后端未结

关注

 5  578

星月不相逢

I have a column with value as

\"RED LOBSTER CA04606\" or \"Red Lobster NewYork WY245\" n so on

How can I extract just the name Red Lobster or Red Lobster N

相关标签:

5条回答

攒了一身酷

2020-12-18 17:01
Using a combination of strsplit and grepl
```
 sapply(strsplit(x, ' '), function(x) paste(x[!grepl('[[:digit:]]',x)], collapse = ' '))
```
This splits by space, then tests whether there are digits splitted vector, and only pastes together those without numbers.
0 讨论(0)
发布评论:

提交评论
- 加载中...

眼角桃花

2020-12-18 17:06

Alternative gsub version:

x <- c("RED LOBSTER CA04606","Red Lobster NewYork WY245")

gsub("(.+)\\s+(.+$)","\\1",x)
[1] "RED LOBSTER"         "Red Lobster NewYork"

and to get the other part of the text:

gsub("(.+)\\s+(.+$)","\\2",x)
[1] "CA04606" "WY245"

0 讨论(0)

被撕碎了的回忆

2020-12-18 17:17

Try gsub

> x <- "RED LOBSTER CA04606"
> gsub("\\S*\\d+\\S*",'', x)
[1] "RED LOBSTER "

> x<-"Red Lobster NewYork WY245"
> gsub("\\S*\\d+\\S*",'', x)
[1] "Red Lobster NewYork "

> x<-"Red Lobster NewYork WY245 BLUE LOBSTER CA04606"
> gsub("\\S*\\d+\\S*",'', x)
[1] "Red Lobster NewYork  BLUE LOBSTER "

0 讨论(0)

陌清茗

2020-12-18 17:17

this is step by step

mystr<-"Red Lobster NewYork WY245"
r<-regexpr("[A-Z][A-Z][0-9]+", mystr)
s<-substr(mystr, r[1], r[1] + attr(r, "match.length"))
mystr<-sub(s, "", mystr)

0 讨论(0)

陌清茗

2020-12-18 17:21

Since you're trying to use stringr, I recommend str_extract (I'd recommend it even if you weren't trying to use stringr):

x <- c('RED LOBTSER CA04606', 'Red Lobster NewYork WY245')
str_extract(x, '[a-zA-Z ]+\\b')
# [1] "RED LOBSTER "          "Red Lobster NewYork "

The '\b' in the regex prevents the 'CA' from 'CA04606' being extracted.

If you don't like that trailing space you could use str_trim to remove it, or you could modify the regex:

str_extract(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b')
# [1] "RED LOBSTER"          "Red Lobster NewYork"

Note - if your string has non-numbers after the post code, the above only returns the words before. So in the example below, if you wanted to get the 'NewYork' after the 'WY245', you can use str_extract_all and paste the results together:

x <- c(x, 'Red Lobster WY245 NewYork')
str_extract_all(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b')
# [[1]]
# [1] "RED LOBSTER"
# 
# [[2]]
# [1] "Red Lobster NewYork"
# 
# [[3]]
# [1] "Red Lobster" "NewYork"    

# Paste the bits together with paste(..., collapse=' ')
sapply(str_extract_all(x, '[a-zA-Z]+(?: +[a-zA-Z]+)*\\b'), paste, collapse=' ')
# [1] "RED LOBSTER"          "Red Lobster NewYork" "Red Lobster NewYork"

0 讨论(0)