Is a unicode user agent legal inside an HTTP header?

ε祈祈猫儿з 提交于 2019-12-05 03:56:26

RFC 2616 (HTTP 1.1) says that message header contents must be "consisting of either *TEXT or combinations of token, separators, and quoted-string". If you look at the definitions for TEXT etc you will find that legal characters are those with byte values not in the [0, 31] range and not equal to 127; therefore characters such as â are as far as I can tell legal as per the spec.

Technically, octets > 127 are allowed in comments. RFC 2616 makes them default to ISO-8859-1, but HTTPbis (the upcoming revision of RFC 2616) has removed that rule so that sometimes in the distant future, we may be able to move to a sane encoding.

Recommendation: strip all octets > 127.

HTTP 1.1 RFC2616 refers to ISO-8859-1, which is a latin based single byte character set.

With the consideration that HTTP traffic is supposed to be single byte, I also am using the latin1 character set for my similar logs. The decision was simply to make my indexes smaller.

If you use UTF8 with VARCHAR, only the characters that are multi-byte require additional bytes, so in table space, it's not much extra. However, indexes are stored fixed-width, so, they're padded with spaces just in case you need them (UTF8 indexes are three times as large as latin1 indexes).

It doesn't affect me if the occasional odd header is unreadable. However, if you're not indexing the column, you may as well use UTF8.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!