Regex to match the relative path of the URL

大兔子大兔子 提交于 2021-01-29 07:37:42

问题


How to write the regex that all three situations below matches? The path, file, and query string has to be exact. The domain part could be any variants of the following (domain name/IP address)

http://www.example.com/path1/path2/foobar.aspx?id=123&key=456
https://www.example.com/path1/path2/foobar.aspx?id=123&key=456
64.123.456.789/path1/path2/foobar.aspx?id=123&key=456

Basically, only the /path1/path2/foobar.aspx?id=123&key=456 needs to be matched. The part in front of it could be any of the variants lead user to the site.


回答1:


Code

\.[^\/]+(.*)

Try it online!

This RegEx captures the relative path of the address. This means that you will need to get the match's capture in your used program rather than the matched characters.


Explanation

\.              Gets the first dot of the address
  [^\/]+        Matches all characters that aren't forward slashes
        (.*)    Captures the rest of the address

Further Explanation

The reason why I'm not able to match (rather than capture) the address is because I don't have any expressions to definitely represent the beginning of the relative path (without having to match any other characters).

This is because some addresses have a protocol part (e.g.: http://) whereas others don't. The extra two forward slashes mean that the RegEx would become much lengthier in order to verify that we get to the correct forward slash.

I used the first dot since all addresses (as far as I know) have a dot in the domain (www.something.com or 64.123.456.789). Since the domain is always immediately before the relative path, we can just skip to the next forward slash and always arrive at the relative path.

Then we just capture the rest of the address (including the first forward slash), which is then easy to get.



来源:https://stackoverflow.com/questions/52043172/regex-to-match-the-relative-path-of-the-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!