Regex to match Youtube URL's

こ雲淡風輕ζ 提交于 2020-01-03 06:40:23

问题


I am trying to validate a Youtube URL using regex:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+~', $videoLink)

It kind of works, but it can match URL's that are malformed. For example, this will match ok:

http://www.youtube.com/watch?v=Zu4WXiPRek

But so will this:

http://www.youtube.com/watch?v=Zu4WX£&P!ek

And this wont:

http://www.youtube.com/watch?v=!Zu4WX£&P4ek

I think it's because of the + operator. It's matching what seems to be the first character after v=, when it needs to try and match everything behind v= with [a-zA-Z0-9-]. Any help is appreciated, thanks.


回答1:


The problem is that you are not requiring any particular number of characters in the v= part of the URL. So, for instance, checking

http://www.youtube.com/watch?v=Zu4WX£&P!ek

will match

http://www.youtube.com/watch?v=Zu4WX

and therefore return true. You need to either specify the number of characters you need in the v= part:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]{10}~', $videoLink)

or specify that the group [a-zA-Z0-9-] must be the last part of the string:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+$~', $videoLink)

Your other example

http://www.youtube.com/watch?v=!Zu4WX£&P4ek

does not match, because the + sign requires that at least one character must match [a-zA-Z0-9-].




回答2:


To provide an alternative that is larger and much less elegant than a regex, but works with PHP's native URL parsing functions so it might be a bit more reliable in the long run:

 $url = "http://www.youtube.com/watch?v=Zu4WXiPRek";

 $query_string = parse_url($url, PHP_URL_QUERY); // v=Zu4WXiPRek

 $query_string_parsed = array();                        
 parse_str($query_string, $query_string_parsed); // an array with all GET params

 echo($query_string_parsed["v"]); // Will output Zu4WXiPRek that you can then
                                  // validate for [a-zA-Z0-9] using a regex



回答3:


Short answer:

preg_match('%(http://www.youtube.com/watch\?v=(?:[a-zA-Z0-9-])+)(?:[&"\'\s])%', $videoLink)

There are a few assumptions made here, so let me explain:

  • I added a capturing group ( ... ) around the entire http://www.youtube.com/watch?v=blah part of the link, so that we can say "I want get the whole validated link up to and including the ?v=movieHash"
  • I added the non-capturing group (?: ... ) around your character set [a-zA-Z0-9-] and left the + sign outside of that. This will allow us to match all allowable characters up to a certain point.
  • Most importantly, you need to tell it how you expect your link to terminate. I'm taking a guess for you with (?:[&"\'\s])

    ?) Will it be in html format (e.g. anchor tag) ? If so, the link in href will obviously end with a " or '.
    ?) Or maybe there's more to the query string, so there would be an & after the value of v.
    ?) Maybe there's a space or line break after the end of the link \s.

The important piece is that you can get much more accurate results if you know what's surrounding what you are searching for, as is the case with many regular expressions.

This non-capturing group (in which I'm making assumptions for you) will take a stab at finding and ignoring all the extra junk after what you care about (the ?v=awesomeMovieHash).

Results:

http://www.youtube.com/watch?v=Zu4WXiPRek
 - Group 1 contains the http://www.youtube.com/watch?v=Zu4WXiPRek

http://www.youtube.com/watch?v=Zu4WX&a=b
 - Group 1 contains http://www.youtube.com/watch?v=Zu4WX

http://www.youtube.com/watch?v=!Zu4WX£&P4ek
 - No match

a href="http://www.youtube.com/watch?v=Zu4WX&size=large"
 - Group 1 contains http://www.youtube.com/watch?v=Zu4WX

http://www.youtube.com/watch?v=Zu4WX£&P!ek
 - No match



回答4:


The "v=..." blob is not guaranteed to be the first parameter in the query part of the URL. I'd recommend using PHP's parse_url() function to break the URL into its component parts. You can also reassemble a pristine URL if someone began the string with "https://" or simply used "youtube.com" instead of "www.youtube.com", etc.

function get_youtube_vidid ($url) {
    $vidid = false;
    $valid_schemes = array ('http', 'https');
    $valid_hosts = array ('www.youtube.com', 'youtube.com');
    $valid_paths = array ('/watch');

    $bits = parse_url ($url);
    if (! is_array ($bits)) {
        return false;
    }
    if (! (array_key_exists ('scheme', $bits)
            and array_key_exists ('host', $bits)
            and array_key_exists ('path', $bits)
            and array_key_exists ('query', $bits))) {
        return false;
    }
    if (! in_array ($bits['scheme'], $valid_schemes)) {
        return false;
    }
    if (! in_array ($bits['host'], $valid_hosts)) {
        return false;
    }
    if (! in_array ($bits['path'], $valid_paths)) {
        return false;
    }
    $querypairs = explode ('&', $bits['query']);
    if (count ($querypairs) < 1) {
        return false;
    }
    foreach ($querypairs as $querypair) {
        list ($key, $value) = explode ('=', $querypair);
        if ($key == 'v') {
            if (preg_match ('/^[a-zA-Z0-9\-_]+$/', $value)) {
                # Set the return value
                $vidid = $value;
            }
        }
    }

    return $vidid;
}



回答5:


Following regex will match any youtube link:

$pattern='@(((http(s)?://(www\.)?)|(www\.)|\s)(youtu\.be|youtube\.com)/(embed/|v/|watch(\?v=|\?.+&v=|/))?([a-zA-Z0-9._\/~#&=;%+?-\!]+))@si';


来源:https://stackoverflow.com/questions/3737634/regex-to-match-youtube-urls

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!