I have a text source with nulls in it and I need to pull them out along with my regex pattern. Can regex even match a null character?
I only realized I had them w
To clarify/add another detail to previous answer: PCRE library accepts pattern as a "C" nul-terminated string. (Quoting PCRE docs: "The pattern is a C string terminated by a binary zero".) That means that pattern cannot contain a literal NUL character - instead, it must be always escaped using means described in other answers. ("Unlike the pattern string, the subject may contain binary zeroes." " 4. Though binary zero characters are supported in the subject string, they are not allowed in a pattern string because it is passed as a nor- mal C string, terminated by zero. The escape sequence \0 can be used in the pattern to represent a binary zero.")
NUL character is the only character in PCRE pattern which must be escaped, all other may go literal: "There is no restriction on the appearance of non-printing characters, apart from the binary zero that terminates a pattern".
As a final comparative note, some other Perl-compatible regex engines do allow literal NULs in a pattern, for example, Python's SRE. E.g. urlib.parse from Python3 has following line: _asciire = re.compile('([\x00-\x7f]+)'). Note the lack of "r" to signify raw literal - it means that unescaping here happens on Python level, and re module gets characters with values 0x00 and 0x7f in pattern.