Browser simulation - Python

帅比萌擦擦* 提交于 2019-12-21 09:22:18

问题


I need to access a few HTML pages through a Python script, problem is that I need COOKIE functionality, therefore a simple urllib HTTP request won't work.

Any ideas?


回答1:


check out Mechanize. "Stateful programmatic web browsing in Python".
It handles cookies automagically.

import mechanize

br = mechanize.Browser()
resp = br.open("http://www.mysitewithcookies.com/")
print resp.info()  # headers
print resp.read()  # content

mechanize also exposes the urllib2 API, with cookie handling enabled by default.




回答2:


The cookielib module provides cookie handling for HTTP clients.

The cookielib module defines classes for automatic handling of HTTP cookies. It is useful for accessing web sites that require small pieces of data – cookies – to be set on the client machine by an HTTP response from a web server, and then returned to the server in later HTTP requests.

The examples in the doc show how to process cookies in conjunction with urllib:

import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")



回答3:


Here's something that does cookies, and as a bonus does authentication for a site that requires a username and password.

import urllib2
import cookielib
import string



def cook():
    url="http://wherever"
    cj = cookielib.LWPCookieJar()
    authinfo = urllib2.HTTPBasicAuthHandler()
    realm="realmName"
    username="userName"
    password="passWord"
    host="www.wherever.com"
    authinfo.add_password(realm, host, username, password)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
    urllib2.install_opener(opener)

    # Create request object
    txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
    try:
        req = urllib2.Request(url, None, txheaders)
        cj.add_cookie_header(req)
        f = urllib2.urlopen(req)

    except IOError, e:
        print "Failed to open", url
        if hasattr(e, 'code'):
            print "Error code:", e.code

    else:

        print f
        print f.read()
        print f.info()
        f.close()
        print 'Cookies:'
        for index, cookie in enumerate(cj):
            print index, " : ", cookie      
        cj.save("cookies.lwp")



回答4:


Why don't you try Dryscrape for this:

Import dryscrape as d
d.start_xvfb()
Br = d.Session()
Br.visit('http://URL.COM')
#open webpage
Br.at_xpath('//*[@id = "email"]').set('user@enail.com')
#finding input by id
Br.at_xpath('//*[@id = "pass"]').set('pasword') 
Br.at_xpath('//*[@id = "submit_button"]').click()
#put id of submit button and click it

You don't need cookie lib to store cookies just install Dryscrape and do it in your style



来源:https://stackoverflow.com/questions/2567738/browser-simulation-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!