How to replace decoded Non-breakable space (nbsp)

前端 未结 2 1169
我寻月下人不归
我寻月下人不归 2020-12-05 23:39

Assuming I have a sting which is \"a s d d\" and htmlentities turns it into
\"a s d d\".

How to

相关标签:
2条回答
  • 2020-12-06 00:05

    Sanitize every type of white spaces.

    preg_replace("/\s+/u", " ", $str);
    

    https://stackoverflow.com/a/40264711/635364

    FYI, PHP Sanitization filter_var() has no filter about these white spaces.

    0 讨论(0)
  • 2020-12-06 00:27

    The problem is that you are specifying the non-breakable space in a wrong way. The proper code of the non-breakable space in UTF-8 encoding is 0xC2A0, it consists of two bytes - C2 (194) and A0 (160), you're specifying only the half of the character's code.

    You can replace it using the simple (and fast) str_replace or using a more flexible regular expression, depending on your needs:

    // faster solution
    $regular_spaces = str_replace("\xc2\xa0", ' ', $original_string);
    
    // more flexible solution
    $regular_spaces = preg_replace('/\xc2\xa0/', ' ', $original_string);
    

    Note that in case of str_replace, you have to use double quotes (") to enclose the search string because it doesn't understand textual representation of character codes so it needs those codes to be converted into actual characters first. That's made automatically by PHP because strings enclosed in double quotes are being processed and special sequences (e.g. newline character \n, textual representation of character codes, etc.) are replaced by actual characters (e.g. 0x0A for \n in UTF-8) before the string value is being used.

    In contrast, the preg_replace function itself understands textual representation of the character codes so you don't need PHP to convert them into actual characters and you can use apostrophes (single quotes, ') to enclose the search string in this case.

    The UTF-8 encoding is so called variable width character encoding, that means character codes consist from one up to four (8 bit) bytes. In general, more frequently used characters have shorter codes while more exotic characters have longer codes.

    0 讨论(0)
提交回复
热议问题