It seems to be common knowledge to use mysql_set_charset / mysqli::set_charset instead of the direct MySQL query set names.
The reason often cited is that se
Two things must be done (in this area):
INSERTs
/SELECTs
will know how to change the bytes during the write/read.The first needs to escape apostrophe and double-quote, since both of those are acceptable quote marks for strings in MySQL syntax. Then, the escape character, itself, needs escaping. Those 3 characters are sufficient for must applications. However if you are trying to escape a BLOB
(such as a .jpg), various control characters may cause trouble. You are probably better off converting to hex, then using UNHEX()
, to avoid issues. Note: Nothing is mentioned here about character sets. If you aren't dealing with BLOBs
, you can get away with PHP's addslashes()
.
The second item's purpose is to say "this stream of bytes is encoded this way (utf8/latin1/etc)". It's only use is for converting between the CHARACTER SET
of the column being stored/fetched and the desired encoding in your client (PHP, etc). It is handled in a variety of ways by various languages. For PHP:
mysql_*
-- Do not use this interface; it is deprecated and will soon be removed.mysqli_*
-- mysqli::set_charset(...)
new PDO('...;charset=UTF8', ...)
Does set_charset()
do something with real_escape_string? I do not know. But it should not matter. SET NAMES
obviously cannot since it is a MySQL command, and knows nothing about PHP.
htmlentities()
is another PHP function in this area. It turns 8-bit codes into &
entities. This should not be used going into MySQL. It would only mask other problems. Use it only in certain situations involving HTML, not PHP or MySQL.
The only reasonable CHARACTER SETs
to use today are ascii, latin1, utf8, and utf8mb4. Those have no "characters" in the "control" area. Sjis and a few other character sets do. This confusion over control characters may be a reason for real_escape_string existing.
Conclusion:
As I see it, you need two mechanisms: One for escaping, and one for establishing the encoding in the client. They are separate.
If they are tied together, the PHP manual has failed to provide any compelling reason for picking one method over another.