I am sure this has been asked before, but I cannot find it.
Basically, assuming you are parsing a text file of unknown origin and want to replace line breaks with so
The regex to find any Unicode line terminator should be
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
rather
than as drewk wrote it, at least in Perl. Taken directly from the perl
5.10.0 documentation (it was removed in later versions).
Note the braces after \x
: U+2029 is \x{2029}
but \x2029
is an ASCII whitespace (U+0020) + a digit 2 + a
digit 9. \n
outside a character class ,is also not guaranteed to match \x{0a}
.