strlen() and UTF-8 encoding

前端 未结 6 1583
挽巷
挽巷 2020-12-05 20:19

Assuming UTF-8 encoding, and strlen() in PHP, is it possible that this string has a length of 4?

I\'m only interested to know about strlen(), not other functions

6条回答
  •  萌比男神i
    2020-12-05 20:49

    It's likely that at some point between the preparation of the question and your reading of it some process has mangled non-ASCII characters in it, so the question was originally about some string with 4 characters in it.

    The sequence � is obtained when you encode the replacement character U+FFFD (�) in UTF-8 and interpret the result in latin1. This character is used as a replacement for byte sequences that don't encode any character when reading text from a file, for example. What has happened is likely this:

    The original question, stored in a latin1 text file, had: $1¢2 (you can replace ¢ with any non-ASCII character)

    The file was read by a program that used UTF-8. Since the byte corresponding to ¢ could not be interpreted, the program substituted it and read the text $1�2. This text was then written out using UTF-8, resulting in $1\xEF\xBF\xBD2 in the file.

    Then some third program comes that reads the file in latin1, and shows $1�2.

提交回复
热议问题