When a browser sends an HTTP request to a web server, what encoding is used to encode the HTTP protocol on the wire? Is it ASCII? UTF8? or UTF16? Or does it specify which en
RFC 2616 includes this:
OCTET =
CHAR =
UPALPHA =
LOALPHA =
ALPHA = UPALPHA | LOALPHA
DIGIT =
CTL =
CR =
LF =
SP =
HT =
<"> =
And then pretty much everything else in the document is defined in terms of those entities (OCTET
, CHAR
, etc.). So you could look through the RFC to find out which parts of an HTTP request/response can include OCTET
s; all other parts must be ASCII. (I'd do it myself, but it'd take a long time)
For the request line specifically, the method name and HTTP version are going to be ASCII characters only, but it's possible that the URL itself could include non-ASCII characters. But if you look at RFC 2396, it says that.
A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.
Which I guess means that it'll consist of ASCII characters as well.