array_filter doesn't seems to work for words having apostrophe and dash

后端 未结 3 1766
时光取名叫无心
时光取名叫无心 2020-12-12 07:36

I have a php code as shown below:

$variable = \\CTIME\\DataPoint\\get_message();  // Line A
echo \'
\'; print_r($variable); echo \         


        
相关标签:
3条回答
  • 2020-12-12 07:55

    You have an encoding mismatch, and I will wager that it's between UTF-8 and MS cp1252.

    cp1252 is a single-byte encoding that Microsoft uses, and is frequently confused with ISO8859-1. While many of the codepoints map to the same glyphs in both, there are some notable differences like that give it away as it only occurs in cp1252.

    If you look at the byte values of strings you'll see:

    • cp1252: \x92
    • UTF-8: \xE2\x80\x99

    Which is why you're having trouble matching.

    You're going to want to make this post your new religion: UTF-8 all the way through

    Without re-hashing that gospel truth, my over-arching recommendations on dealing with character encodings are:

    1. Never assume a character encoding, always set it explicitly.
    2. Never attempt to detect a character encoding, as it is virtually impossible to do with any level of accuracy.
    3. Never use utf8_encode() or utf8_decode().
      • They will only ever convert between ISO8859-1 and UTF-8.
      • They do not make any attempt to check if the input is the encoding they expect. [see point 2]
      • Even when they encounter an detectably invalid byte sequence they do not make any attempt to care and will simple introduce trash into the output.
    4. Always use functions like mb_convert_encoding() or iconv(), and always specify both the input and output encodings. [see point 1]
    0 讨论(0)
  • 2020-12-12 08:01

    Your code seems good.
    My guess is that this is a charset issue: the "right single quotation mark" (’) is a unicode char and is not part of ASCII charset.

    If the string from the source data and the string in your PHP script use a distinct charset, they might be different (not the same sequence of bytes).

    For instance, if you're using UTF-8: check that the data you're fetching with get_live_today_streams() is UTF-8 encoded, and make sure that your .php file is UTF-8 encoded as well.

    (Have a look at this post to see how to convert a ANSI file to UTF-8 using notepad++)

    0 讨论(0)
  • 2020-12-12 08:16

    If your string always contains Hello and Aujourd hui, use this regex workaround - notice the u flag that makes it multibyte compatible to match the dash and quote characters with a dot.

    https://3v4l.org/Xe0Rg

    0 讨论(0)
提交回复
热议问题