Is a unicode user agent legal inside an HTTP header?

烈酒焚心 提交于 2019-12-06 22:52:18

问题


An application I'm maintaining loads user agents extracted from web logs into a MySQL table column using the 'latin1' charset. Occasionally, it fails to load a user agent that looks like this:

Mozilla/5.0 (Iâ?; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML^C like Gecko) Version

I suspect it's choking on Iâ?. I'm working to figure out if this should be supported, or if it's corruption introduced by the upstream logging system. Is this a legal user agent in a HTTP header?


回答1:


RFC 2616 (HTTP 1.1) says that message header contents must be "consisting of either *TEXT or combinations of token, separators, and quoted-string". If you look at the definitions for TEXT etc you will find that legal characters are those with byte values not in the [0, 31] range and not equal to 127; therefore characters such as â are as far as I can tell legal as per the spec.




回答2:


Technically, octets > 127 are allowed in comments. RFC 2616 makes them default to ISO-8859-1, but HTTPbis (the upcoming revision of RFC 2616) has removed that rule so that sometimes in the distant future, we may be able to move to a sane encoding.

Recommendation: strip all octets > 127.




回答3:


HTTP 1.1 RFC2616 refers to ISO-8859-1, which is a latin based single byte character set.

With the consideration that HTTP traffic is supposed to be single byte, I also am using the latin1 character set for my similar logs. The decision was simply to make my indexes smaller.

If you use UTF8 with VARCHAR, only the characters that are multi-byte require additional bytes, so in table space, it's not much extra. However, indexes are stored fixed-width, so, they're padded with spaces just in case you need them (UTF8 indexes are three times as large as latin1 indexes).

It doesn't affect me if the occasional odd header is unreadable. However, if you're not indexing the column, you may as well use UTF8.



来源:https://stackoverflow.com/questions/10384741/is-a-unicode-user-agent-legal-inside-an-http-header

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!