how to convert ISO 8859-1 Characters to UTF-8

走远了吗. 提交于 2019-12-02 17:54:36

问题


I use CURL to get content from another site, but i don't know why it's auto convert from UTF-8 to ISO 8859-1, like follow:

site: abc.com:

Cửa Hàng Chip Chip: Rộn ràng đón Giáng sinh với những vật phẩm trang trí Noel đầy màu sắc của CHIPCHIP GIFT SHOP

But when i use CURL get content from that site, i got follow:

Cửa Hàng Chip Chip: Rộn ràng đón Giáng sinh với những vật phẩm trang trí Noel đầy màu sắc của CHIPCHIP GIFT SHOP

So how to convert it's become to UTF-8 ?


回答1:


I'd recommend using iconv.

iconv --list gives you a list of all known encodings, and you can then use iconv -f FROM_ENCODING -t TO_ENCODING do do your conversion. It can also read from stdin and therefore be plugged to curl.

But regarding the comment you got for your question: It seems like the file author didn't care about using the correct encoding and decided to stick with (old-style?) &auml and stuff.




回答2:


Take your string in variable and use following function.

$var = "";
echo utf8_encode($var);



回答3:


Judging from the line you pasted, the problem appears to be with HTML entities, not with character enconding. The encoded chars look fine to me.

You need to translate those HTML entities to encoded chars. Which tool to use will depend of your enviroment or programming language. I don't think it can be done with CURL alone.

PHP has htmlspecialchars_decode(). Python unescape() from the HTMLParser module.




回答4:


curl does not convert anything, downloads things "as is"

What you see are character entities, valid html, and the browser that the conversion to a readable form.

You can check this by opening the file saved by curl in a browser. It will look like the live page.




回答5:


You can try this:

html_entity_decode($string)

See more here: html_entity_decode




回答6:


Your files aren’t being converted to another encoding. They’re using HTML character entities. You need to convert those entities, such as é to UTF-8, such as é. This takes one extra line of code after you convert to UTF-8, if you even need to do that.



来源:https://stackoverflow.com/questions/8253914/how-to-convert-iso-8859-1-characters-to-utf-8

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!