Python: urllib/urllib2/httplib confusion

陌路散爱 提交于 2019-11-27 16:55:56

Focus on urllib2 for this, it works quite well. Don't mess with httplib, it's not the top-level API.

What you're noting is that urllib2 doesn't follow the redirect.

You need to fold in an instance of HTTPRedirectHandler that will catch and follow the redirects.

Further, you may want to subclass the default HTTPRedirectHandler to capture information that you'll then check as part of your unit testing.

cookie_handler= urllib2.HTTPCookieProcessor( self.cookies )
redirect_handler= HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)

You can then use this opener object to POST and GET, handling redirects and cookies properly.

You may want to add your own subclass of HTTPHandler to capture and log various error codes, also.

Here's my take on this issue.

#!/usr/bin/env python

import urllib
import urllib2


class HttpBot:
    """an HttpBot represents one browser session, with cookies."""
    def __init__(self):
        cookie_handler= urllib2.HTTPCookieProcessor()
        redirect_handler= urllib2.HTTPRedirectHandler()
        self._opener = urllib2.build_opener(redirect_handler, cookie_handler)

    def GET(self, url):
        return self._opener.open(url).read()

    def POST(self, url, parameters):
        return self._opener.open(url, urllib.urlencode(parameters)).read()


if __name__ == "__main__":
    bot = HttpBot()
    ignored_html = bot.POST('https://example.com/authenticator', {'passwd':'foo'})
    print bot.GET('https://example.com/interesting/content')
    ignored_html = bot.POST('https://example.com/deauthenticator',{})

@S.Lott, thank you. Your suggestion worked for me, with some modification. Here's how I did it.

data = urllib.urlencode(params)
url = host+page
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)

cookies = CookieJar()
cookies.extract_cookies(response,request)

cookie_handler= urllib2.HTTPCookieProcessor( cookies )
redirect_handler= HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)

response = opener.open(request)

I had to do this exact thing myself recently. I only needed classes from the standard library. Here's an excerpt from my code:

from urllib import urlencode
from urllib2 import urlopen, Request

# encode my POST parameters for the login page
login_qs = urlencode( [("username",USERNAME), ("password",PASSWORD)] )

# extract my session id by loading a page from the site
set_cookie = urlopen(URL_BASE).headers.getheader("Set-Cookie")
sess_id = set_cookie[set_cookie.index("=")+1:set_cookie.index(";")]

# construct headers dictionary using the session id
headers = {"Cookie": "session_id="+sess_id}

# perform login and make sure it worked
if "Announcements:" not in urlopen(Request(URL_BASE+"login",headers=headers), login_qs).read():
    print "Didn't log in properly"
    exit(1)

# here's the function I used after this for loading pages
def download(page=""):
    return urlopen(Request(URL_BASE+page, headers=headers)).read()

# for example:
print download(URL_BASE + "config")

I'd give Mechanize (http://wwwsearch.sourceforge.net/mechanize/) a shot. It may well handle your cookie/headers transparently.

Try twill - a simple language that allows users to browse the Web from a command-line interface. With twill, you can navigate through Web sites that use forms, cookies, and most standard Web features. More to the point, twill is written in Python and has a python API, e.g:

from twill import get_browser
b = get_browser()

b.go("http://www.python.org/")
b.showforms()

Besides the fact that you may be missing a cookie, there might be some field(s) in the form that you are not POSTing to the webserver. The best way would be to capture the actual POST from a web browser. You can use LiveHTTPHeaders or WireShark to snoop the traffic and mimic the same behaviour in your script.

Funkload is a great web app testing tool also. It wraps webunit to handle the browser emulation, then gives you both functional and load testing features on top.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!