Getting parts of a URL (Regex)

后端 未结 26 2542
说谎
说谎 2020-11-22 02:13

Given the URL (single line):
http://test.example.com/dir/subdir/file.html

How can I extract the following parts using regular expressions:

  1. The Subd
26条回答
  •  孤城傲影
    2020-11-22 02:45

    I was trying to solve this in javascript, which should be handled by:

    var url = new URL('http://a:b@example.com:890/path/wah@t/foo.js?foo=bar&bingobang=&king=kong@kong.com#foobar/bing/bo@ng?bang');
    

    since (in Chrome, at least) it parses to:

    {
      "hash": "#foobar/bing/bo@ng?bang",
      "search": "?foo=bar&bingobang=&king=kong@kong.com",
      "pathname": "/path/wah@t/foo.js",
      "port": "890",
      "hostname": "example.com",
      "host": "example.com:890",
      "password": "b",
      "username": "a",
      "protocol": "http:",
      "origin": "http://example.com:890",
      "href": "http://a:b@example.com:890/path/wah@t/foo.js?foo=bar&bingobang=&king=kong@kong.com#foobar/bing/bo@ng?bang"
    }
    

    However, this isn't cross browser (https://developer.mozilla.org/en-US/docs/Web/API/URL), so I cobbled this together to pull the same parts out as above:

    ^(?:(?:(([^:\/#\?]+:)?(?:(?:\/\/)(?:(?:(?:([^:@\/#\?]+)(?:\:([^:@\/#\?]*))?)@)?(([^:\/#\?\]\[]+|\[[^\/\]@#?]+\])(?:\:([0-9]+))?))?)?)?((?:\/?(?:[^\/\?#]+\/+)*)(?:[^\?#]*)))?(\?[^#]+)?)(#.*)?
    

    Credit for this regex goes to https://gist.github.com/rpflorence who posted this jsperf http://jsperf.com/url-parsing (originally found here: https://gist.github.com/jlong/2428561#comment-310066) who came up with the regex this was originally based on.

    The parts are in this order:

    var keys = [
        "href",                    // http://user:pass@host.com:81/directory/file.ext?query=1#anchor
        "origin",                  // http://user:pass@host.com:81
        "protocol",                // http:
        "username",                // user
        "password",                // pass
        "host",                    // host.com:81
        "hostname",                // host.com
        "port",                    // 81
        "pathname",                // /directory/file.ext
        "search",                  // ?query=1
        "hash"                     // #anchor
    ];
    

    There is also a small library which wraps it and provides query params:

    https://github.com/sadams/lite-url (also available on bower)

    If you have an improvement, please create a pull request with more tests and I will accept and merge with thanks.

提交回复
热议问题