问题
I use CURL to get content from another site, but i don't know why it's auto convert from UTF-8 to ISO 8859-1, like follow:
site: abc.com:
Cửa Hàng Chip Chip: Rộn ràng đón Giáng sinh với những vật phẩm trang trí Noel đầy màu sắc của CHIPCHIP GIFT SHOP
But when i use CURL get content from that site, i got follow:
Cửa Hàng Chip Chip: Rộn ràng đón Giáng sinh với những vật phẩm trang trí Noel đầy màu sắc của CHIPCHIP GIFT SHOP
So how to convert it's become to UTF-8 ?
回答1:
I'd recommend using iconv
.
iconv --list
gives you a list of all known encodings, and you can then use iconv -f FROM_ENCODING -t TO_ENCODING
do do your conversion. It can also read from stdin and therefore be plugged to curl
.
But regarding the comment you got for your question: It seems like the file author didn't care about using the correct encoding and decided to stick with (old-style?) ä
and stuff.
回答2:
Take your string in variable and use following function.
$var = "";
echo utf8_encode($var);
回答3:
Judging from the line you pasted, the problem appears to be with HTML entities, not with character enconding. The encoded chars look fine to me.
You need to translate those HTML entities to encoded chars. Which tool to use will depend of your enviroment or programming language. I don't think it can be done with CURL alone.
PHP has htmlspecialchars_decode(). Python unescape() from the HTMLParser module.
回答4:
curl does not convert anything, downloads things "as is"
What you see are character entities, valid html, and the browser that the conversion to a readable form.
You can check this by opening the file saved by curl in a browser. It will look like the live page.
回答5:
You can try this:
html_entity_decode($string)
See more here: html_entity_decode
回答6:
Your files aren’t being converted to another encoding. They’re using HTML character entities. You need to convert those entities, such as é
to UTF-8, such as é. This takes one extra line of code after you convert to UTF-8, if you even need to do that.
来源:https://stackoverflow.com/questions/8253914/how-to-convert-iso-8859-1-characters-to-utf-8