Using Lookahead and Lookbehind in regex to ignore a word anywhere between a BBCode

折月煮酒 提交于 2021-02-05 08:25:07

问题


I'm looking to expand on a particular code here:

/(?<![@#]|(\[img\]))\b(".str_replace(" ", "[\-_ ]", $key).")(?!\[\/img\])\b/i

Currently, it detects whether @ or # is directly behind the $key in question (which is fine), OR whether [img] or [/img] is directly before/after the $key (a problem). I want to add a wildcard so that the $key ANYWHERE in between [img] and [/img] will not be replaced, while still keeping the fact that @ or # must still be directly behind the $key. I am aware that wildcards are not allowed in lookbehind.

Is this possible?

EDIT: I misinterpreted my own code a bit. I realized that [/img] will still trigger even if [img] doesn't precede the word, thus allowing @BLUE[/img] to not trigger. I wish to separate the cases between #/@ and [img][/img]. Assistance on this will greatly help as well.

Basically, everything within [img] and [/img] will ignore preg_replace of the $key, @$key, and #$key. However, even as a standalone @$key and #$key (without [img] tags), $key should not be replaced.


回答1:


Using lookarounds is not a good way to do that since you can't use a variable length lookbehind.

The goal is to skip content between [img] tags, lets see a way:

$result = preg_replace('~\[(img|url)].*?\[/\1](*SKIP)(*FAIL)|(?<![@#])\bHELLO\b~s',
                       'GOODBYE', $str);

(*SKIP) forbids to retry the part of the string that has been matched on the left if the subpattern fails on the right.

(*FAIL) forces the pattern to fail.

Since [img] tags are always tried first by the first branch of the pattern, the second branch of the pattern matches always parts of the string that are outside [img] tags.

another way

you can describe the key you search as a word that is preceded by several [img]..[/img] and other characters that are not [img] tags or the key word:

$pattern = <<<'LOD'
~
(?>                      # atomic group: all possible content before "HELLO"
    (?>                      # other characters
        [^[H]++                  # all characters except [ and H
      |                        # OR
        \[(?!img]|url])          # a [ not followed by img or url
      |                        # OR
        \BH                      # an H not preceded by a word boundary
      |                        # OR
        H(?!ELLO\b)              # an H not followed by ELLO and a word boundary
    )+
  |                        # OR
    \[(img|url)].*?\[/\1]    # img or url tags
)*
\K                       # resets all from the match result
(?<![@#])HELLO
~sx
LOD;


来源:https://stackoverflow.com/questions/21410592/using-lookahead-and-lookbehind-in-regex-to-ignore-a-word-anywhere-between-a-bbco

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!