Regex URL Path from URL

问题

I am having a little bit of regex trouble.

I am trying to get the path in this url videoplay.

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello

If I use this regex /.+ it matches /video as well.

I would need some kind of anti / negative match to not include //

回答1:

In case if you need this for your JavaScript web-app: the best answer I ever found on this topic is here. Basic (and also original) version of the code looks like this:

var parser = document.createElement('a');
parser.href = "http://example.com:3000/pathname/?search=test#hash";

parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port;     // => "3000"
parser.pathname; // => "/pathname/"
parser.search;   // => "?search=test"
parser.hash;     // => "#hash"
parser.host;     // => "example.com:3000"

Thank you John Long, you made by day!

回答2:

(http[s]?:\/\/)?([^\/\s]+\/)(.*) group 3
Demo: http://regex101.com/r/vK4rV7/1

回答3:

This expression gets everything after videoplay, aka the url path.

/\/(videoplay.+)/

This expression gets everything after the port. Also consisting of the path.

/\:\d./(.+)/

However If using Node.js I recommend the native url module.

var url = require('url')
var youtubeUrl = "http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello"
url.parse(youtubeUrl)

Which does all of the regex work for you.

{
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: 'video.google.co.uk:80',
  port: '80',
  hostname: 'video.google.co.uk',
  hash: '#hello',
  search: '?docid=-7246927612831078230&hl=en',
  query: 'docid=-7246927612831078230&hl=en',
  pathname: '/videoplay',
  path: '/videoplay?docid=-7246927612831078230&hl=en',
  href: 'http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello' 
}

回答4:

You can try this:

^(?:[^/]*(?:/(?:/[^/]*/?)?)?([^?]+)(?:\??.+)?)$

([^?]+) above is the capturing group which returns your path.

Please note that this is not an all-URL regex. It just solves your problem of matching all the text between the first "/" occurring after "//" and the following "?" character.

If you need an all-matching regex, you can check this StackOverflow link where they have discussed and dissected all possibilities of an URI into its constituent parts including your "path".
If you consider that an overkill AND if you know that your input URL will always follow a pattern of having your path between the first "/" and following "?", then the above regex should be sufficient.

回答5:

function getPath(url, defaults){
    var reUrlPath = /(?:\w+:)?\/\/[^\/]+([^?#]+)/;
    var urlParts = url.match(reUrlPath) || [url, defaults];
    return urlParts.pop();
}
alert( getPath('http://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('https://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('//stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/', 'unknown') );

回答6:

You mean a negative lookbehind? (?<!/)

回答7:

Its not a regex solution, but most languages have a URL library that will parse any URL into its constituent parts. This may be a better solution for what you are doing.

回答8:

var subject =
'<link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=ec617d715196"><link rel="apple-touch-icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a"><link rel="image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">';
var re=/\"[a-z]+:\/\/[^ ]+"/m;
document.write(subject.match(re));

You can try this

/\"[a-z]+:\/\/[^ ]+/

Usage

if (/\"[a-z]+:\/\/[^ ]+/m.test(subject)) {  // Successful match } else {    // Match attempt failed }

回答9:

I think this is what you're after: [^/]+$

Demo: http://regex101.com/r/rG8gB9

来源：https://stackoverflow.com/questions/12023430/regex-url-path-from-url

标签

javascript

regex

node.js

url