Prefer charset declaration in HTML meta tag or HTTP header?

前端未结

关注

 2  1958

被撕碎了的回忆 2020-12-18 04:45

I\'m parsing a lot of sites. All works fine, I\'m reading also charset declarations to convert encodings. Now I\'ve a problem with http://celleheute.de/sonntagsfuhrung-3/.

2条回答

借酒劲吻你 (楼主)

2020-12-18 05:05

There's simply no answer to this. The author of the page has committed an error by giving conflicting information. Which one is correct may as well be decided by a coin toss.

In general, I'd prefer the HTTP header as the primary value. The meta tag is just meant as a fallback anyway. If you want to follow any logic at all, first try to decode the document using the charset specified in the HTTP header. If that clearly fails, because certain bytes are invalid in the given encoding, try again in the charset specified in the meta tag, if any. If that still fails, all bets are off.

If neither fails but the encodings conflict, either involve a human or try some statical analysis on the decoded text, which may tell you which is more likely to be correct.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...