Get more than 1 quotations in text paragraph in R regex

℡╲_俬逩灬. 提交于 2019-12-20 04:15:49

问题


First: Find the texts that are inside the quotations "I want everything inside here".

Second: To extract 1 sentence before quotation.

I would like to achieve this output desirable by look behind regex in R if possible

Example:

Yoyo. He is sad. Oh no! "Don't sad!" Yeah: "Testing...  testings," Boys. Sun. Tree... 0.2% green,"LL" "WADD" HOLA.

Desired Output:

[1] Oh no! "Don't sad!"
[2] Yeah: "Testing... testings"
[3] Tree... 0.2% green, "LL"
[4] Tree... 0.2% green, "LL" "WADD"

dput:

"Yoyo. He is sad. Oh no! \"Don't sad!\" Yeah: \"Testing...  testings,\" Boys. Sun. Tree... 0.2% green,\"LL\" \"WAAD\" HOLA."

Tried using this but can't work:

str_extract(t, "(?<=\\.\\s)[^.:]*[.:]\\s*\"[^\"]*\"")

Also tried:

regmatches(t , gregexpr('^[^\\.]+[\\.\\,\\:]\\s+(.*(?:\"[^\"]+\\")).*$', t))

regmatches(t , gregexpr('\"[^\"]*\"(?<=\\s[.?][^\\.\\s])', t))

Tried your method @naurel:

> regmatches(t, regexpr("(?:\"? *([^\"]*))(\"[^\"]*\")", t, perl=T))
[1] " Yoyo. He is sad. Oh no! \"Don't sad!\""

回答1:


Since you just want the last sentence I've cleared the regex for you : result

Explanation : First you're looking for something that is between quotes. And if there is multiples quotes successively you want them to match as one.

(\"[^\"]*\"(?: *\"[^\"]*\")*)

Does the trick. Then you want to match the sentence before this group. A sentence is starting with a CAPITAL letter. So we will start the match to the first capital encounter before the previously defined group (ie : not followed by any other CAPITAL letter)

([A-Z](?:[a-z0-9\W\s])*)

Put it togeither and you obtain :

([A-Z](?:[a-z0-9\W\s])*)(\"[^\"]*\"(?: *\"[^\"]*\")*)


来源:https://stackoverflow.com/questions/33934605/get-more-than-1-quotations-in-text-paragraph-in-r-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!