Lookbehind to get the text in R regex [duplicate]

只愿长相守 提交于 2021-01-28 23:05:47

问题


I have data like this:

Good afternoon. Hello. My bro's name is John... and he said softly 0.8% : "Don't you think I am handsome??" HAHA. jiji. koko.

I would like to take get the sentence before the quotations, and text inside the quotation by using Look Behind regex in R.

First: I want to look for quotation marks in a bunch of text.

Second: Look back and extract 1 sentence before the quotations. If there is no sentence, it's fine. Still extract the text in the quotations.

Below is what I would like to achieve:

My bro's name is John... and he said softly 0.8%: "Don't you think I am handome??"

I tried using this, but I would like to seek help by using Look Behind regex. Thank you.

regmatches(x, gregexpr('[^\\.]+[\\.\\:]"([^"]*)"', x))

dput :

"Good afternoon. Hello. My bro's name is John... and he said softly 0.8% : \"Don't you think I am handsome?? \" HAHA. jiji. koko."

回答1:


We can also use gsub. We match one or more characters that is not a . followed by a . and one or more space (\\s+) or one or more space followed by one or more characters that are not space till the end of the string ($) and replace with ''.

gsub('[^.]+\\.\\s+|\\s+[^ ]+$', '', str1)
#[1] "My bro's name is John... and he said softly 0.8% : \"Don't you think I am handsome?? \""

Or we match one or more characters that are not a . followed by a . followed by one or more space (\\s+), then we capture the rest of the string until the " followed by one or more characters (.*) to the end of the string and replace with the capture group (\\1).

gsub('^[^.]+\\.\\s+(.*(?:"[^"]+")).*$', '\\1', str1, perl=TRUE)
#[1] "My bro's name is John... and he said softly 0.8% : \"Don't you think I am handsome?? \""


来源:https://stackoverflow.com/questions/33930738/lookbehind-to-get-the-text-in-r-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!