Regular expression to remove hostname and port from URL?

后端 未结 6 2188
别跟我提以往
别跟我提以往 2020-12-15 08:55

I need to write some javascript to strip the hostname:port part from a url, meaning I want to extract the path part only.

i.e. I want to write a function getPath(url

6条回答
  •  暖寄归人
    2020-12-15 09:43

    RFC 3986 ( http://www.ietf.org/rfc/rfc3986.txt ) says in Appendix B

    The following line is the regular expression for breaking-down a well-formed URI reference into its components.

      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
       12            3  4          5       6  7        8 9
    

    The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression as $. For example, matching the above expression to

      http://www.ics.uci.edu/pub/ietf/uri/#Related
    

    results in the following subexpression matches:

      $1 = http:
      $2 = http
      $3 = //www.ics.uci.edu
      $4 = www.ics.uci.edu
      $5 = /pub/ietf/uri/
      $6 = 
      $7 = 
      $8 = #Related
      $9 = Related
    

    where indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as

      scheme    = $2
      authority = $4
      path      = $5
      query     = $7
      fragment  = $9
    

提交回复
热议问题