When should an asterisk be encoded in an HTTP URL?

岁酱吖の 提交于 2019-11-28 22:50:46

Short answer

The current definition of URL syntax indicates that you never need to percent-encode the asterisk character in the path, query, or fragment components of a URL.


HTTP 1.1

As @Riley Major pointed out, the RFC that HTTP 1.1 references for URL syntax has been obsoleted by RFC3986, which isn't as black and white about the use of asterisks as the originally referenced RFC was.

RFC2396 (URL spec before January 2005 - original answer)

An asterisk never needs to be encoded in HTTP 1.1 URLs as * is listed as an "unreserved character" in RFC2396, which is used to define URI syntax in HTTP 1.1. Unreserved characters are allowed in the path component of a URL.

2.3. Unreserved Characters

Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include upper and lower case letters, decimal digits, and a limited set of punctuation marks and symbols.

   unreserved  = alphanum | mark

   mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.

RFC3986 (current URL syntax for HTTP)

RFC3986 modifies RFC2396 to make the asterisk a reserved character, with the reason that it is "typically unsafe to decode". My understanding of this RFC is that the unencoded asterisk character is allowed in the path, query, and fragment components of a URL, as these components do not specify the asterisk as a delimiter (2.2. Reserved Characters):

These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax... If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

Additionally, 3.3 Path confirms that a subset of reserved characters (sub-delims) can be used unencoded in path segments (parts of the path component broken up by /):

Aside from dot-segments ("." and "..") in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment. ... For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same.

HTTP 1.0

HTTP 1.0 references RFC1738 to define URL syntax, which through a series of updates and obsoletes means it uses the same RFC as HTTP 1.1 for URL syntax.

As far as backwards compatibility goes, RFC1738 specifies the asterisk as a reserved character, though as HTTP 1.0 doesn't actually define any special meaning for an unencoded asterisk in the path component of a URL, it shouldn't break anything if you use one. This should mean you're still safe putting asterisks in the URLs pointing to the oldest of systems.


As a side note, the asterisk character does have a special meaning in a Request-URI in both HTTP specs, but it's not possible to represent it with an HTTP URL:

The asterisk "*" means that the request does not apply to a particular resource, but to the server itself, and is only allowed when the method used does not necessarily apply to a resource. One example would be

   OPTIONS * HTTP/1.1

Disclaimer: I'm just reading and interpreting these RFCs myself, so I may be wrong.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!