Validating URLs in Python

前端 未结 5 668
广开言路
广开言路 2020-12-15 06:43

I\'ve been trying to figure out what the best way to validate a URL is (specifically in Python) but haven\'t really been able to find an answer. It seems like there isn\'t o

相关标签:
5条回答
  • 2020-12-15 06:57

    I would use the validators package. Here is the link to the documentation and installation instructions.

    It is just as simple as

    import validators
    url = 'YOUR URL'
    validators.url(url)
    

    It will return true if it is, and false if not.

    0 讨论(0)
  • 2020-12-15 07:11

    The original question is a bit old, but you might also want to look at the Validator-Collection library I released a few months back. It includes high-performing regex-based validation of URLs for compliance against the RFC standard. Some details:

    • Tested against Python 2.7, 3.4, 3.5, 3.6, 3.7, and 3.8
    • No dependencies on Python 3.x, one conditional dependency in Python 2.x (drop-in replacement for Python 2.x's buggy re module)
    • Unit tests that cover 100+ different succeeding/failing URL patterns, including non-standard characters and the like. As close to covering the whole spectrum of the RFC standard as I've been able to find.

    It's also very easy to use:

    from validator_collection import validators, checkers
    
    checkers.is_url('http://www.stackoverflow.com')
    # Returns True
    
    checkers.is_url('not a valid url')
    # Returns False
    
    value = validators.url('http://www.stackoverflow.com')
    # value set to 'http://www.stackoverflow.com'
    
    value = validators.url('not a valid url')
    # raises a validator_collection.errors.InvalidURLError (which is a ValueError)
    
    value = validators.url('https://123.12.34.56:1234')
    # value set to 'https://123.12.34.56:1234'
    
    value = validators.url('http://10.0.0.1')
    # raises a validator_collection.errors.InvalidURLError (which is a ValueError)
    
    value = validators.url('http://10.0.0.1', allow_special_ips = True)
    # value set to 'http://10.0.0.1'
    

    In addition, Validator-Collection includes about 60+ other validators, including IP addresses (IPv4 and IPv6), domains, and email addresses as well, so something folks might find useful.

    0 讨论(0)
  • 2020-12-15 07:11

    Assuming you are using python 3, you could use urllib. The code would go something like this:

    import urllib.request as req
    import urllib.parse as p
    
    def foo():
        url = 'http://bar.com'
        request = req.Request(url)
        try:
            response = req.urlopen(request)
            #response is now a string you can search through containing the page's html
        except:
            #The url wasn't valid
    

    If there is no error on the line "response = ..." then the url is valid.

    0 讨论(0)
  • 2020-12-15 07:20

    This looks like it might be a duplicate of How do you validate a URL with a regular expression in Python?

    You should be able to use the urlparse library described there.

    >>> from urllib.parse import urlparse # python2: from urlparse import urlparse
    >>> urlparse('actually not a url')
    ParseResult(scheme='', netloc='', path='actually not a url', params='', query='', fragment='')
    >>> urlparse('http://google.com')
    ParseResult(scheme='http', netloc='google.com', path='', params='', query='', fragment='')
    

    call urlparse on the string you want to check and then make sure that the ParseResult has attributes for scheme and netloc

    0 讨论(0)
  • 2020-12-15 07:20

    you can also try using urllib.request to validate by passing the URL in the urlopen function and catching the exception for URLError.

    from urllib.request import urlopen, URLError
    
    def validate_web_url(url="http://google"):
        try:
            urlopen(url)
            return True
        except URLError:
            return False
    

    This would return False in this case

    0 讨论(0)
提交回复
热议问题