Am I correctly supporting UTF-8 in my PHP apps?

前端 未结 5 1344
盖世英雄少女心
盖世英雄少女心 2020-11-30 20:39

I would like to make sure that everything I know about UTF-8 is correct. I have been trying to use UTF-8 for a while now but I keep stumbling across more and more bugs and o

5条回答
  •  半阙折子戏
    2020-11-30 21:11

    Most of what you are doing now should be correct.

    Some notes: any utf_* collation in MySQL would store your data correctly as UTF-8, the only difference between them is the collation (alphabetical order) applied when sorting.

    You can tell Apache and PHP to issue the correct charset headers setting AddDefaultCharset utf-8 in httpd.conf/.htaccess and default_charset = "utf-8" in php.ini respectively.

    You can tell the mbstring extension to take care of the string functions. This works for me:

    mbstring.internal_encoding=utf-8
    mbstring.http_output=UTF-8
    mbstring.encoding_translation=On
    mbstring.func_overload=6
    

    (this leaves the mail() function untouched - I found setting it to 7 played havoc with my mail headers)

    For charset conversion take a look at https://sourceforge.net/projects/phputf8/.

    PHP doesn't care at all about what's in the variable, it just stores and retrieves blindly its content.

    You'll have unexpected results if you declare one mbstring.internal_encoding and supply to a mb_* function strings in another encoding. You can anyway safely send ASCII to utf-8 functions.

    If you're worried about somebody posting incorrectly encoded stuff on purpose I believe you shoud consider HTML Purifier to filter GET/POST data before processing.

    Accept-charset has been in the specs since forever, but its real-world support in browsers is more or less zero. The browser will tipically use the encoding af the page containing the form.

    UTF-16 is not the big brother of UTF-8, it just serves a different purpose.

提交回复
热议问题