Can PCRE regex match a null character?

前端 未结 3 1638
野性不改
野性不改 2020-12-09 15:38

I have a text source with nulls in it and I need to pull them out along with my regex pattern. Can regex even match a null character?

I only realized I had them w

3条回答
  •  误落风尘
    2020-12-09 16:29

    To clarify/add another detail to previous answer: PCRE library accepts pattern as a "C" nul-terminated string. (Quoting PCRE docs: "The pattern is a C string terminated by a binary zero".) That means that pattern cannot contain a literal NUL character - instead, it must be always escaped using means described in other answers. ("Unlike the pattern string, the subject may contain binary zeroes." " 4. Though binary zero characters are supported in the subject string, they are not allowed in a pattern string because it is passed as a nor- mal C string, terminated by zero. The escape sequence \0 can be used in the pattern to represent a binary zero.")

    NUL character is the only character in PCRE pattern which must be escaped, all other may go literal: "There is no restriction on the appearance of non-printing characters, apart from the binary zero that terminates a pattern".

    As a final comparative note, some other Perl-compatible regex engines do allow literal NULs in a pattern, for example, Python's SRE. E.g. urlib.parse from Python3 has following line: _asciire = re.compile('([\x00-\x7f]+)'). Note the lack of "r" to signify raw literal - it means that unescaping here happens on Python level, and re module gets characters with values 0x00 and 0x7f in pattern.

提交回复
热议问题