how to convert ISO 8859-1 Characters to UTF-8

牧云@^-^@ 提交于 2019-12-02 08:28:41

I'd recommend using iconv.

iconv --list gives you a list of all known encodings, and you can then use iconv -f FROM_ENCODING -t TO_ENCODING do do your conversion. It can also read from stdin and therefore be plugged to curl.

But regarding the comment you got for your question: It seems like the file author didn't care about using the correct encoding and decided to stick with (old-style?) &auml and stuff.

Take your string in variable and use following function.

$var = "";
echo utf8_encode($var);
AJJ

Judging from the line you pasted, the problem appears to be with HTML entities, not with character enconding. The encoded chars look fine to me.

You need to translate those HTML entities to encoded chars. Which tool to use will depend of your enviroment or programming language. I don't think it can be done with CURL alone.

PHP has htmlspecialchars_decode(). Python unescape() from the HTMLParser module.

curl does not convert anything, downloads things "as is"

What you see are character entities, valid html, and the browser that the conversion to a readable form.

You can check this by opening the file saved by curl in a browser. It will look like the live page.

Minh Tai Nguyen

You can try this:

html_entity_decode($string)

See more here: html_entity_decode

Your files aren’t being converted to another encoding. They’re using HTML character entities. You need to convert those entities, such as é to UTF-8, such as é. This takes one extra line of code after you convert to UTF-8, if you even need to do that.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!