URL encoding the space character: + or ?

后端 未结 4 2115
迷失自我
迷失自我 2020-11-22 02:00

When is a space in a URL encoded to +, and when is it encoded to %20?

4条回答
  •  没有蜡笔的小新
    2020-11-22 02:52

    This confusion is because URLs are still 'broken' to this day.

    Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.

    We can extract detailed information about the "http://www.google.com" URL:

    +---------------+-------------------+
    |      Part     |      Data         |
    +---------------+-------------------+
    |  Scheme       | http              |
    |  Host         | www.google.com    |
    +---------------+-------------------+
    

    If we look at a more complex URL such as:

    "https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third"

    we can extract the following information:

    +-------------------+---------------------+
    |        Part       |       Data          |
    +-------------------+---------------------+
    |  Scheme           | https               |
    |  User             | bob                 |
    |  Password         | bobby               |
    |  Host             | www.lunatech.com    |
    |  Port             | 8080                |
    |  Path             | /file;p=1           |
    |  Path parameter   | p=1                 |
    |  Query            | q=2                 |
    |  Fragment         | third               |
    +-------------------+---------------------+
    
    https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third
    \___/   \_/ \___/ \______________/ \__/\_______/ \_/ \___/
      |      |    |          |          |      | \_/  |    |
    Scheme User Password    Host       Port  Path |   | Fragment
            \_____________________________/       | Query
                           |               Path parameter
                       Authority
    

    The reserved characters are different for each part.

    For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.

    Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".

    This means that the "blue+light blue" string has to be encoded differently in the path and query parts:

    "http://example.com/blue+light%20blue?blue%2Blight+blue".

    From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.

    This boils down to:

    You should have %20 before the ? and + after.

    Source

提交回复
热议问题