Match strings between delimiting characters

本秂侑毒 提交于 2019-12-11 05:04:05

问题


There are strings appearing in a line with other text which are delimited by opening and closing quote, like the ones below. I am trying to find a regex that would match the word/phrase with the comma as internal delimiter (or the whole externally delimited content if there is no comma as in the case of a single word/phrase). For example for these phrases:

‘verdichten’
‘verdichten, verstopfen’
‘dunkel, finster, wolkig’
‘fort sein, verloren sein, verloren’
‘von den Nymph ergriffen, verzückt, verrückt’
‘der sich halten kann, halten kann’

The result I would like would be:

[[verdichten]]
[[verdichten]], [[verstopfen]]
[[dunkel]], [[finster]], [[wolkig]]
[[fort sein]], [[verloren sein]], [[verloren]]
[[von den Nymph ergriffen]], [[verzückt]], [[verrückt]]
[[der sich halten kann]], [[halten kann]]

It should work in Notepad++ or EmEditor.

I can match with (‘)(.+?)(’) but I cannot find a way to replace as described.


回答1:


One option could be making use of the \G anchor and 2 capturing groups:

(?:‘|\G(?!^))([^,\r\n’]+)(?=[^\r\n’]*’)(?:(,\h*)|’)

In parts

  • (?: Non capturing group
    • Match
    • | Or
    • \G(?!^) Assert position at the end of previous match, not at the start
  • )* Close non capturing group
  • ( Capture group 1
    • [^,\r\n’]+ Match 1+ times any char except , or newline
  • ) Close group
  • (?=[^\r\n’]*’) Positive lookahead, assert what is on the right is
  • (?: Non capturing group
    • (,\h*)|’ Either capture a comma and 0+ horizontal whitespace chars in group 2, or match
  • ) Close non capturing group

Regex demo

In the replacement use:

[[$1]]$2

Output

[[verdichten]]
[[verdichten]], [[verstopfen]]
[[dunkel]], [[finster]], [[wolkig]]
[[fort sein]], [[verloren sein]], [[verloren]]
[[von den Nymph ergriffen]], [[verzückt]], [[verrückt]]
[[der sich halten kann]], [[halten kann]]



回答2:


With the help of @The fourth bird's answer, here's a the regex that will not include the space characters at the extremities of the matches:

(?:‘|\s*(?!^))([^,\r\n’]+)(?=[^\r\n’]*’)(?:(,)|’)

Replacing with [[$1]]$2

will give the trimmed tokens:

[[verdichten]],[[verstopfen]]
[[dunkel]],[[finster]],[[wolkig]]
[[fort sein]],[[verloren sein]],[[verloren]]
[[von den Nymph ergriffen]],[[verzückt]],[[verrückt]]
[[der sich halten kann]],[[halten kann]]

demo

Edit: For the test context ‘verdichten’ test context example you gave, you can use:

(?:‘|\G\s*(?!^))([^,\r\n’]+)(?=[^\r\n’]*’)(?:(,)|’)



来源:https://stackoverflow.com/questions/58084119/match-strings-between-delimiting-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!