I am having a very strange problem with pound signs displaying incorrectly (or not at all) on a web page.
I am keying text in a textbox, which then gets (briefly) st
I am keying text in a textbox, which then gets (briefly) stored in XML before being displayed in a new IE(6) window.
The problem is most likely embedded in this sequence. It would help if you could elaborate the specifics of how this sequence is acheived.
The most common cause for this sort of problem is a mismatch in an understanding between what a client actually encodes a character as and what the server thinks the encoding is. The simplest solution to this is to place the accept-charset
attribute on the form
element which makes the character encoding of a post explicit.
The text posted in the stuff
field will be encoded in utf-8.
The reason for some inconsitencies are:-
You say that the site doesn't always say which character encodings are being used. In that case, browsers will have to guess. And they might guess differently on different pages, which is quite likely the reason why you're seeing inconsistencies.
A lot of character encodings are "ASCII plus" (ASCII plus extended Latin characters; ASCII plus the Greek alphabet; ASCII plus the Cyrillic alphabet; etc.). How is a browser supposed to know which is intended? One way is by looking at code-point frequency: "I'm seeing a lot of the code-point [blah], which would be character [?A] in Greek, or character [?B] in Cyrillic. Character [?A] isn't very common in Greek, but [?B] is quite frequent in Bulgarian, so this page is quite likely in the Cyrillic alphabet." That kind of thing. And that means that slightly different text on the page, shuffling around the code-point frequencies, can lead to browsers interpreting the text encoding completely differently. This is why we use UTF-8 these days. It's also why we declare the text encoding in HTTP headers and in meta tags.