Getting parts of a URL (Regex)

后端 未结 26 2520
说谎
说谎 2020-11-22 02:13

Given the URL (single line):
http://test.example.com/dir/subdir/file.html

How can I extract the following parts using regular expressions:

  1. The Subd
26条回答
  •  自闭症患者
    2020-11-22 03:11

    A single regex to parse and breakup a full URL including query parameters and anchors e.g.

    https://www.google.com/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash

    ^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$

    RexEx positions:

    url: RegExp['$&'],

    protocol:RegExp.$2,

    host:RegExp.$3,

    path:RegExp.$4,

    file:RegExp.$6,

    query:RegExp.$7,

    hash:RegExp.$8

    you could then further parse the host ('.' delimited) quite easily.

    What I would do is use something like this:

    /*
        ^(.*:)//([A-Za-z0-9\-\.]+)(:[0-9]+)?(.*)$
    */
    proto $1
    host $2
    port $3
    the-rest $4
    

    the further parse 'the rest' to be as specific as possible. Doing it in one regex is, well, a bit crazy.

提交回复
热议问题