ValueError: unknown url type in urllib2, though the url is fine if opened in a browser

问题

Basically, I am trying to download a URL using urllib2 in python.

the code is the following:

import urllib2
req = urllib2.Request('www.tattoo-cover.co.uk')
req.add_header('User-agent','Mozilla/5.0')
result = urllib2.urlopen(req)

it outputs ValueError and the program crushes for the URL in the example. When I access the url in a browser, it works fine.

Any ideas how to handle the problem?

UPDATE:

thanks for Ben James and sth the problem is detected => add 'http://'

Now the question is refined: Is it possible to handle such cases automatically with some builtin function or I have to do error handling with subsequent string concatenation?

回答1:

When you enter a URL in a browser without the protocol, it defaults to HTTP. urllib2 won't make that assumption for you; you need to prefix it with http://.

回答2:

You have to use a complete URL including the protocol, not just specify a host name.

The correct URL would be http://www.tattoo-cover.co.uk/.

回答3:

You can use the method urlparse from urllib (Python 3) to check the presence of an addressing scheme (http, https, ftp) and concatenate the scheme in case it is not present:

In [1]: from urllib.parse import urlparse
    ..: 
    ..: url = 'www.myurl.com'
    ..: if not urlparse(url).scheme:
    ..:     url = 'http://' + url
    ..: 
    ..: url
Out[1]: 'http://www.myurl.com'

回答4:

You can use the urlparse function for that I think :

Python User Documentation

来源：https://stackoverflow.com/questions/5823572/valueerror-unknown-url-type-in-urllib2-though-the-url-is-fine-if-opened-in-a-b

标签

python

urllib2

httprequest