£ becomes £ Why? XML ISO encoding issue?

岁酱吖の 提交于 2019-12-02 01:24:46

The unicode code point for £ is U+00A3. In the UTF-8 encoding it is 0xC2 0xA3. Now, in ISO-8859-1 0xC2 is Å, and 0xA3 is £. So, somewhere in the flow, what you enter becomes UTF-8 which is interpreted as ISO-8859-1. Have you looked at how the "form" encodes the data before reaching your PHP code.

And, besides, what is this SimpleDOM doing w.r.t. entities? Â and £ are not valid XML entities without a declaration. Does SimpleDOM add the declarations?

Forty-two's response defintely fixed one of the problems... I was putting encoding=iso-8859-1 in the xml doc but using utf-8 in the html meta content-type tag.

One other thing to note if anyone comes across this answer. I was also having brutal problems with the curved quote from a Windows document (copying text from Word 2007 into html form field on my site). There is a BIG difference between a curved quote and an apostrophe. On English keyboards Word interprets the upper-dash (an apostrophe) as a single curved quote. ISO-8859-1 does not have such an entity (its coded in the Windows-1252 "standard"). This was killing my XML documents as they were parsed by PHP from the form field. The solution was simple:

$var = htmlentities($var,ENT_QUOTES, "Windows-1252");

Other people have alluded to htmlentites and striptags... but it took me 4 half days to pull all this together. Hopefully save someone some time.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!