Matching a word after another word in R regex

|▌冷眼眸甩不掉的悲伤 提交于 2020-07-08 20:35:39

问题


I have a dataframe in R with one column (called 'city') containing a text string. My goal is to extract only one word ie the city text from the text string. The city text always follows the word 'in', eg the text might be:

'in London'
'in Manchester'

I tried to create a new column ('municipality'):

df$municipality <- gsub(".*in ?([A-Z+).*$","\\1",df$city)

This gives me the first letter following 'in', but I need the next word (ONLY the next word)

I then tried:

gsub(".*in ?([A-Z]\w+))")

which worked on a regex checker, but not in R. Can someone please help me. I know this is probably very simple but I can't crack it. Thanks in advance.


回答1:


We can use str_extract

library(stringr)
str_extract(df$city, '(?<=in\\s)\\w+')
#[1] "London"     "Manchester"



回答2:


The following regular expression will match the second word from your city column:

^in\\s([^ ]*).*$

This matches the word in followed a single space, followed by a capture group of any non space characters, which comprises the city name.

Example:

df <- data.frame(city=c("in London town", "in Manchester city"))

df$municipality <- gsub("^in\\s([^ ]*).*$", "\\1", df$city)

> df$municipality
[1] "London"     "Manchester"


来源:https://stackoverflow.com/questions/34804708/matching-a-word-after-another-word-in-r-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!