Convert html entities to UTF-8, but keep existing UTF-8

风流意气都作罢 提交于 2021-02-10 11:45:31

问题


I want to convert html entities to UTF-8, but mb_convert_encoding destroys already UTF-8 encoded characters. Whats the correct way?

$text = "äöü ä ö ü ß";
var_dump(mb_convert_encoding($text, 'UTF-8', 'HTML-ENTITIES'));
// string(24) "äöü ä ö ü ß"

回答1:


mb_convert_encoding() isn't the correct function for what you're trying to achieve: you should really be using html_entity_decode() instead, because it will only convert the actual html entities to UTF-8, and won't affect the existing UTF-8 characters in the string.

$text = "äöü ä ö ü ß";
var_dump(html_entity_decode($text, ENT_COMPAT | ENT_HTML401, 'UTF-8'));

which gives

string(18) "äöü ä ö ü ß"

Demo




回答2:


In my localhost I get string(18) "äöü ä ö ü ß" .

I think it's something related with your page encoding. Edit the file with Notepad++ and from the toolbar go to encoding and change to 'Encode in ANSI'. If it doesn't work then try with 'Encode in UTF-8 without BOM'.




回答3:


and if that still isn't working try this

html_entity_decode($html, ENT_QUOTES, 'cp1252');

This is what was needed on a Windows IIS system for things to start working correctly. see source



来源:https://stackoverflow.com/questions/31338277/convert-html-entities-to-utf-8-but-keep-existing-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!