I\'ve been trying to figure out what the best way to validate a URL is (specifically in Python) but haven\'t really been able to find an answer. It seems like there isn\'t o
I would use the validators package. Here is the link to the documentation and installation instructions.
It is just as simple as
import validators
url = 'YOUR URL'
validators.url(url)
It will return true if it is, and false if not.
The original question is a bit old, but you might also want to look at the Validator-Collection library I released a few months back. It includes high-performing regex-based validation of URLs for compliance against the RFC standard. Some details:
re
module)It's also very easy to use:
from validator_collection import validators, checkers
checkers.is_url('http://www.stackoverflow.com')
# Returns True
checkers.is_url('not a valid url')
# Returns False
value = validators.url('http://www.stackoverflow.com')
# value set to 'http://www.stackoverflow.com'
value = validators.url('not a valid url')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)
value = validators.url('https://123.12.34.56:1234')
# value set to 'https://123.12.34.56:1234'
value = validators.url('http://10.0.0.1')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)
value = validators.url('http://10.0.0.1', allow_special_ips = True)
# value set to 'http://10.0.0.1'
In addition, Validator-Collection includes about 60+ other validators, including IP addresses (IPv4 and IPv6), domains, and email addresses as well, so something folks might find useful.
Assuming you are using python 3, you could use urllib. The code would go something like this:
import urllib.request as req
import urllib.parse as p
def foo():
url = 'http://bar.com'
request = req.Request(url)
try:
response = req.urlopen(request)
#response is now a string you can search through containing the page's html
except:
#The url wasn't valid
If there is no error on the line "response = ..." then the url is valid.
This looks like it might be a duplicate of How do you validate a URL with a regular expression in Python?
You should be able to use the urlparse
library described there.
>>> from urllib.parse import urlparse # python2: from urlparse import urlparse
>>> urlparse('actually not a url')
ParseResult(scheme='', netloc='', path='actually not a url', params='', query='', fragment='')
>>> urlparse('http://google.com')
ParseResult(scheme='http', netloc='google.com', path='', params='', query='', fragment='')
call urlparse
on the string you want to check and then make sure that the ParseResult
has attributes for scheme
and netloc
you can also try using urllib.request
to validate by passing the URL in the urlopen
function and catching the exception for URLError
.
from urllib.request import urlopen, URLError
def validate_web_url(url="http://google"):
try:
urlopen(url)
return True
except URLError:
return False
This would return False
in this case