Character Encoding Issue - Strange Behaviour From Pound Signs (£) with UTF-8 IE6 / ASP / XML

前端未结

关注

 2  888

星月不相逢

I am having a very strange problem with pound signs displaying incorrectly (or not at all) on a web page.

I am keying text in a textbox, which then gets (briefly) st

相关标签:

2条回答

爱一瞬间的悲伤

2020-12-17 06:02
I am keying text in a textbox, which then gets (briefly) stored in XML before being displayed in a new IE(6) window.

The problem is most likely embedded in this sequence. It would help if you could elaborate the specifics of how this sequence is acheived.

The most common cause for this sort of problem is a mismatch in an understanding between what a client actually encodes a character as and what the server thinks the encoding is. The simplest solution to this is to place the accept-charset attribute on the form element which makes the character encoding of a post explicit.

The text posted in the stuff field will be encoded in utf-8.

The reason for some inconsitencies are:-
1. It possible that the server can code the characters in the db incorrectly but then when sending those same characters to a browser reverse the corruption, things look fine again on the browser.
2. ISO-8859-1 means different things in different places. IE6 is somewhat loose with that character set, and will actually treat is as Windows-1252. Other applications place a sctricter interpretaion on ISO-8859-1.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-12-17 06:20

You say that the site doesn't always say which character encodings are being used. In that case, browsers will have to guess. And they might guess differently on different pages, which is quite likely the reason why you're seeing inconsistencies.

A lot of character encodings are "ASCII plus" (ASCII plus extended Latin characters; ASCII plus the Greek alphabet; ASCII plus the Cyrillic alphabet; etc.). How is a browser supposed to know which is intended? One way is by looking at code-point frequency: "I'm seeing a lot of the code-point [blah], which would be character [?A] in Greek, or character [?B] in Cyrillic. Character [?A] isn't very common in Greek, but [?B] is quite frequent in Bulgarian, so this page is quite likely in the Cyrillic alphabet." That kind of thing. And that means that slightly different text on the page, shuffling around the code-point frequencies, can lead to browsers interpreting the text encoding completely differently. This is why we use UTF-8 these days. It's also why we declare the text encoding in HTTP headers and in meta tags.

0 讨论(0)
发布评论:

提交评论
- 加载中...