Why do some query strings work even if parameters are not URL-encoded?

我与影子孤独终老i 提交于 2019-11-27 09:22:59
unor

The reserved characters of an URI are mostly used as delimiters -- it doesn’t mean that they may not be used, it only means that they have a special purpose, and if you don’t need them for this purpose, you have to percent-encode them.

The query component starts with the first ? and ends with the first # (if any, otherwise with the end of the URI). For the query component itself, there are no reserved characters defined.

The URI standard RFC 3986 defines that the query component can contain these characters:

  • a-z, A-Z
  • 0-9
  • / ? : @ ! $ & ' ( ) * + , ; = - . _ ~
  • percent-encoded characters

It even explicitly mentions:

The characters slash ("/") and question mark ("?") may represent data within the query component.


The query component of your example URI is this:

embedded=true&url=http://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf

Apart from letters, it contains =, &, :, /, ., ?, _, all of which are allowed in the query.

Note that the name=value format (separated by &) in the query component is just a convention, not something defined in the specification.

This is from the relevant RFC, 1738:

https://www.ietf.org/rfc/rfc1738.txt

3.3. HTTP

The HTTP URL scheme is used to designate Internet resources
accessible using HTTP (HyperText Transfer Protocol).

The HTTP protocol is specified elsewhere. This specification only
describes the syntax of HTTP URLs.

An HTTP URL takes the form:

  http://<host>:<port>/<path>?<searchpart>

where and are as described in Section 3.1. If : is omitted, the port defaults to 80. No user name or password is
allowed. <path> is an HTTP selector, and <searchpart> is a query
string. The <path> is optional, as is the <searchpart> and its
preceding "?". If neither <path> nor<searchpart> is present, the "/" may also be omitted.

Within the <path> and <searchpart> components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.

The special characters in "http://" only apply to the "protocol" specification at the start of the URL. It's optional in most browsers (implicitly "http://").

The first "?" separates the "path" from the "searchpart". Each "&" separates different arguments in the "searchpart".

Your browser should differentiate between ?embedded=true and &url=http://www.pdf995.com/samples/pdf.pdf.

'Hope that helps

Because in a url some characters have special meanings, a question mark (?) is used to separate the path from the query, an ampersand (&) is used as a separator between key value pairs. So for characters like this, if we were to use them as a value in a query string the browser would get confused, we use encoding so that we can be sure that the data is not ambiguous. All these characters you have shown are not treated ambiguously as they are used in valid places according to the http URL schema.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!