Characters allowed in a URL

前端 未结 9 880
没有蜡笔的小新
没有蜡笔的小新 2020-11-22 05:33

Does anyone know the full list of characters that can be used within a GET without being encoded? At the moment I am using A-Z a-z and 0-9... but I am looking to find out th

相关标签:
9条回答
  • 2020-11-22 06:13

    From RFC 1738 specification:

    Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

    EDIT: As @Jukka K. Korpela correctly points out, this RFC was updated by RFC 3986. This has expanded and clarified the characters valid for host, unfortunately it's not easily copied and pasted, but I'll do my best.

    In first matched order:

    host        = IP-literal / IPv4address / reg-name
    
    IP-literal  = "[" ( IPv6address / IPvFuture  ) "]"
    
    IPvFuture   = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
    
    IPv6address =         6( h16 ":" ) ls32
                      /                       "::" 5( h16 ":" ) ls32
                      / [               h16 ] "::" 4( h16 ":" ) ls32
                      / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
                      / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
                      / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
                      / [ *4( h16 ":" ) h16 ] "::"              ls32
                      / [ *5( h16 ":" ) h16 ] "::"              h16
                      / [ *6( h16 ":" ) h16 ] "::"
    
    ls32        = ( h16 ":" h16 ) / IPv4address
                      ; least-significant 32 bits of address
    
    h16         = 1*4HEXDIG 
                   ; 16 bits of address represented in hexadecimal
    
    IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
    
    dec-octet   = DIGIT                 ; 0-9
                  / %x31-39 DIGIT         ; 10-99
                  / "1" 2DIGIT            ; 100-199
                  / "2" %x30-34 DIGIT     ; 200-249
                  / "25" %x30-35          ; 250-255
    
    reg-name    = *( unreserved / pct-encoded / sub-delims )
    
    unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"     <---This seems like a practical shortcut, most closely resembling original answer
    
    reserved    = gen-delims / sub-delims
    
    gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
    
    sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
    
    pct-encoded = "%" HEXDIG HEXDIG
    
    0 讨论(0)
  • 2020-11-22 06:18

    The upcoming change is for chinese, arabic domain names not URIs. The internationalised URIs are called IRIs and are defined in RFC 3987. However, having said that I'd recommend not doing this yourself but relying on an existing, tested library since there are lots of choices of URI encoding/decoding and what are considered safe by specification, versus what are safe by actual use (browsers).

    0 讨论(0)
  • 2020-11-22 06:20

    The full list of the 66 unreserved characters is in RFC3986, here: http://tools.ietf.org/html/rfc3986#section-2.3

    This is any character in the following regex set:

    [A-Za-z0-9_.\-~]
    
    0 讨论(0)
  • 2020-11-22 06:22

    The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding)

    http://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters

    says these are RFC 3986 unreserved characters (sec. 2.3) as well as reserved characters (sec 2.2) if they need to retain their special meaning. And also a percent character as part of a percent-encoding.

    0 讨论(0)
  • 2020-11-22 06:28

    If you like to give a special kind of experience to the users you could use pushState to bring a wide range of characters to the browser's url:

    var u="";var tt=168;
    for(var i=0; i< 250;i++){
     var x = i+250*tt;
    console.log(x);
     var c = String.fromCharCode(x);
     u+=c; 
    }
    history.pushState({},"",250*tt+u);
    
    0 讨论(0)
  • 2020-11-22 06:29

    From here

    Thus, only alphanumerics, the special characters $-_.+!*'(), and reserved characters used for their reserved purposes may be used unencoded within a URL.

    0 讨论(0)
提交回复
热议问题