mb_detect_encoding detects ASCII as UTF-8?

社会主义新天地 提交于 2019-11-28 00:50:45

问题


I'm trying to automatically convert imported IPTC metadata from images to UTF-8 for storage in a database based on the PHP mb_ functions.

Currently it looks like this:

$val = mb_convert_encoding($val, 'UTF-8', mb_detect_encoding($val));

However, when mb_detect_encoding() is supplied an ASCII string (special characters in the Latin1-fields from 192-255) it detects it as UTF-8, hence in the following attempt to convert everything to proper UTF-8 all special characters are removed.

I tried writing my own method by looking for Latin1 values and if none occured I would go on to letting mb_detect_encoding decide what it is. But I stopped midway when I realized that I can't be sure that other encoding don't use the same byte values for other things.

So, is there a way to properly detect ASCII to feed to mb_convert_encoding as the source encoding?


回答1:


Specifying a custom order, where ASCII is detected first, works.

mb_detect_encoding($val, 'ASCII,UTF-8,ISO-8859-15');

For completeness, the list of available encodings is at http://www.php.net/manual/en/mbstring.supported-encodings.php




回答2:


You can specified explicitly

$val = mb_convert_encoding($val, 'UTF-8', 'ASCII');

EDIT:

$val = mb_convert_encoding($val, 'UTF-8', 'auto');



回答3:


If you do not want to worry about what encodings you will allow, you can add them all

$encoding = mb_detect_encoding($val, implode(',', mb_list_encodings()));



来源:https://stackoverflow.com/questions/16298639/mb-detect-encoding-detects-ascii-as-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!