How to properly escape a backslash to match a literal backslash in single-quoted and double-quoted PHP regex patterns

后端 未结 2 1569
广开言路
广开言路 2020-11-28 16:22

To match a literal backslash, many people and the PHP manual say: Always triple escape it, like this \\\\\\\\

Note:

相关标签:
2条回答
  • 2020-11-28 16:46

    A backslash character (\) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine picks it up as an escape character. To avoid this, you need to write four backslash characters, depending upon how you quote the pattern.

    To understand the difference between the two types of quoting patterns, consider the following two var_dump() statements:

    var_dump('~\\\~');
    var_dump("~\\\\~");
    

    Output:

    string(4) "~\\~"
    string(4) "~\\~"
    

    The escape sequence \~ has no special meaning in PHP when it's used in a single-quoted string. Three backslashes do also work because the PHP parser doesn't know about the escape sequence \~. So \\ will become \ but \~ will remain as \~.

    Which one should you use:

    For clarity, I'd always use ~\\\\~ when I want to match a literal backslash. The other one works too, but I think ~\\\\~ is more clear.

    0 讨论(0)
  • 2020-11-28 16:46

    There is no difference between the actual escaping of the slash in either single or double quoted strings in PHP - as long as you do it correct. The reason why you're getting a WONT WORK on your first example is, as pointed out in the comments, it expands \t to the tab meta character.

    When you're using just three backslashes, the last one in your single quoted string will be interpreted as \~, which as far as single quoted strings go, will be left as it is (since it does not match a valid escape sequence). It is however just a coincidence that this will be parsed as you expect in this case, and not have some sort of side effect (i.e, \\\' would not behave the same way).

    The reason for all the escaping is that the regular expression also needs backslashes escaped in certain situations, as they have special meaning there as well. This leads to the large number of backslashes after each other, such as \\\\ (which takes eight backslashes for the markdown parser, as it yet again adds another level of escaping).

    Hopefully that clears it up, as you seem to be confused regarding the handling of backslashes in single/double quoted strings more than the behaviour in the regular expression itself (which will be the same regardless of " or ', as long as you escape things correctly).

    0 讨论(0)
提交回复
热议问题