I would like to make sure that everything I know about UTF-8 is correct. I have been trying to use UTF-8 for a while now but I keep stumbling across more and more bugs and o
Most of what you are doing now should be correct.
Some notes: any utf_* collation in MySQL would store your data correctly as UTF-8, the only difference between them is the collation (alphabetical order) applied when sorting.
You can tell Apache and PHP to issue the correct charset headers setting AddDefaultCharset utf-8 in httpd.conf/.htaccess and default_charset = "utf-8" in php.ini respectively.
You can tell the mbstring extension to take care of the string functions. This works for me:
mbstring.internal_encoding=utf-8
mbstring.http_output=UTF-8
mbstring.encoding_translation=On
mbstring.func_overload=6
(this leaves the mail() function untouched - I found setting it to 7 played havoc with my mail headers)
For charset conversion take a look at https://sourceforge.net/projects/phputf8/.
PHP doesn't care at all about what's in the variable, it just stores and retrieves blindly its content.
You'll have unexpected results if you declare one mbstring.internal_encoding and supply to a mb_* function strings in another encoding. You can anyway safely send ASCII to utf-8 functions.
If you're worried about somebody posting incorrectly encoded stuff on purpose I believe you shoud consider HTML Purifier to filter GET/POST data before processing.
Accept-charset has been in the specs since forever, but its real-world support in browsers is more or less zero. The browser will tipically use the encoding af the page containing the form.
UTF-16 is not the big brother of UTF-8, it just serves a different purpose.