I have cases where user-entered data from an html textarea or input is sometimes sent with \\u00a0 (non-breaking spaces) instead of spaces when encoded as utf-8
For whatever reason \s doesn't match \u00a0.
I think the "whatever reason" is that is not supposed to. Only the POSIX and \p construct character classes are Unicode aware. The character-class abbreviations are not:
Sequence As[...] Meaning
\d [0-9] ASCII decimal digit character
\D [^0-9] Any character except a digit
\h [0-9a-fA-F] Hexadecimal digit character
\H [^0-9a-fA-F] Any character except a hex digit
\s [ \t\r\n\f] ASCII whitespace character
\S [^ \t\r\n\f] Any character except whitespace
\w [A-Za-z0-9\_] ASCII word character
\W [^A-Za-z0-9\_] Any character except a word character