HTTPS log in with urllib2

匿名 (未验证) 提交于 2019-12-03 02:52:02

问题:

I currently have a little script that downloads a webpage and extracts some data I'm interested in. Nothing fancy.

Currently I'm downloading the page like so:

import commands command = 'wget --output-document=- --quiet --http-user=USER --http-password=PASSWORD https://www.example.ca/page.aspx' status, text = commands.getstatusoutput(command) 

Although this works perfectly, I thought it'd make sense to remove the dependency on wget. I thought it should be trivial to convert the above to urllib2, but thus far I've had zero success. The Internet is full urllib2 examples, but I haven't found anything that matches my need for simple username and password HTTP authentication with a HTTPS server.

回答1:

The requests module provides a modern API to HTTP/HTTPS capabilities.

import requests  url = 'https://www.someserver.com/toplevelurl/somepage.htm'  res = requests.get(url, auth=('USER', 'PASSWORD'))  status = res.status_code text   = res.text 


回答2:

this says, it should be straight forward

[as] long as your local Python has SSL support.

If you use just HTTP Basic Authentication, you must set different handler, as described here.

Quoting the example there:

import urllib2  theurl = 'http://www.someserver.com/toplevelurl/somepage.htm' username = 'johnny' password = 'XXXXXX' # a great password  passman = urllib2.HTTPPasswordMgrWithDefaultRealm() # this creates a password manager passman.add_password(None, theurl, username, password) # because we have put None at the start it will always # use this username/password combination for  urls # for which `theurl` is a super-url  authhandler = urllib2.HTTPBasicAuthHandler(passman) # create the AuthHandler  opener = urllib2.build_opener(authhandler)  urllib2.install_opener(opener) # All calls to urllib2.urlopen will now use our handler # Make sure not to include the protocol in with the URL, or # HTTPPasswordMgrWithDefaultRealm will be very confused. # You must (of course) use it when fetching the page though.  pagehandle = urllib2.urlopen(theurl) # authentication is now handled automatically for us 

If you do Digest, you'll have to set some additional headers, but they are the same regardless of SSL usage. Google for python+urllib2+http+digest.

Cheers,



回答3:

The urllib2 documentation has an example of working with Basic Authentication:

http://docs.python.org/library/urllib2.html#examples



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!