mechanize | 易学教程

Python authenticate and launch private page using webbrowser, urllib and CookieJar

阅读更多关于 Python authenticate and launch private page using webbrowser, urllib and CookieJar

问题 I want to login with cookiejar and and launch not the login page but a page that can only be seen after authenticated. I know mechanize does that but besides not working for me now, I rather do this without it. Now I have, import urllib, urllib2, cookielib, webbrowser from cookielib import CookieJar username = 'my_username' password = 'my_password' url = 'my_login_page' cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) login_data = urllib.urlencode({'my

mechanize._mechanize.FormNotFoundError: no form matching name 'q'

阅读更多关于 mechanize._mechanize.FormNotFoundError: no form matching name 'q'

问题 Can anyone help me get this form selection correct? Trying to get a crawl of google, I get the error: mechanize._mechanize.FormNotFoundError: no form matching name 'q' Unusual, since I have seen several other tutorials using it, and: p.s. I don't plan to SLAM google with requests, just hope to use an automatic selector to take the effort out of finding academic citation pdfs from time to time. <f GET http://www.google.com.tw/search application/x-www-form-urlencoded <HiddenControl(ie=Big5)

How do I grab a thumbnail screenshot of many websites?

阅读更多关于 How do I grab a thumbnail screenshot of many websites?

问题 I have a list of 2500 websites and need to grab a thumbnail screenshot of them. How do I do that? I could try to parse the sites either with Perl or Python, Mechanize would be a good thing. But I am not so experienced with Perl. 回答1: Here is Perl solution: use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); $mech->get('http://google.com'); my $png = $mech->content_as_png(); From the docs: Returns the given tab or the current page rendered as PNG image. All parameters are

Submitting Forms with Mechanize (Python)

阅读更多关于 Submitting Forms with Mechanize (Python)

Well, I am trying to login to a site using Python and mechanize. I've got the site opened: site = br.open("http://example.com/login.php") And I've got a list of the forms (with br.forms). <GET http://example.com/search.php application/x-www-form-urlencoded <HiddenControl(search=1) (readonly)> ... <POST http://example.com/login.php application/x-www-form-urlencoded <TextControl(username=)> <PasswordControl(password=)> <CheckboxControl(stay=[1])> <SubmitControl(<None>=Log in) (readonly)>> I've been trying to submit the username and password fields. I tried doing it like this: br.select_form(nr=0

How to properly use mechanize to scrape AJAX sites

阅读更多关于 How to properly use mechanize to scrape AJAX sites

问题 So I am fairly new to web scraping. There is this site that has a table on it, the values of the table are controlled by Javascript. The values will determine the address of future values that my browser is told to request from the Javascript. These new pages have JSON responses that the script updates the table with in my browser. So I wanted to build a class with a mechanize method that takes in an url and spits out the body response, the first time a HTML, afterwards, the body response

Is it possible to hook up a more robust HTML parser to Python mechanize?

阅读更多关于 Is it possible to hook up a more robust HTML parser to Python mechanize?

I am trying to parse and submit a form on a website using mechanize, but it appears that the built-in form parser cannot detect the form and its elements. I suspect that it is choking on poorly formed HTML, and I'd like to try pre-parsing it with a parser better designed to handle bad HTML (say lxml or BeautifulSoup) and then feeding the prettified, cleaned-up output to the form parser. I need mechanize not only for submitting the form but also for maintaining sessions (I'm working this form from within a login session.) I'm not sure how to go about doing this, if it is indeed possible.. I'm

Python browser emulator with JS support [closed]

阅读更多关于 Python browser emulator with JS support [closed]

I want to grab some data from a site. Usually I use mechanize for such things, but now the site gives the data with JS. Alas, mechanize doesn't support it. What can I use instead? unutbu Here are some options: Selenium ( tutorial ) For headless automation, Selenium can be used in conjunction with PhantomJS WebKit Spidermonkey Here are some code examples: PyQt4 + WebKit An example using PyQt4+WebKit, and redone with Selenium 来源： https://stackoverflow.com/questions/21777306/python-browser-emulator-with-js-support

Python's mechanize proxy support

阅读更多关于 Python's mechanize proxy support

I have a question about python mechanize's proxy support. I'm making some web client script, and I would like to insert proxy support function into my script. For example, if I have: params = urllib.urlencode({'id':id, 'passwd':pw}) rq = mechanize.Request('http://www.example.com', params) rs = mechanize.urlopen(rq) How can I add proxy support into my mechanize script? Whenever I open this www.example.com website, i would like it to go through the proxy. You use mechanize.Request.set_proxy(host, type) (at least as of 0.1.11) assuming an http proxy running at localhost:8888 req = mechanize

Python unable to retrieve form with urllib or mechanize

阅读更多关于 Python unable to retrieve form with urllib or mechanize

问题 I'm trying to fill out and submit a form using Python, but I'm not able to retrieve the resulting page. I've tried both mechanize and urllib/urllib2 methods to post the form, but both run into problems. The form I'm trying to retrieve is here: http://zrs.leidenuniv.nl/ul/start.php. The page is in Dutch, but this is irrelevant to my problem. It may be noteworthy that the form action redirects to http://zrs.leidenuniv.nl/ul/query.php. First of all, this is the urllib/urllib2 method I've tried:

Python mechanize, following link by url and what is the nr parameter?

阅读更多关于 Python mechanize, following link by url and what is the nr parameter?

I'm sorry to have to ask something like this but python's mechanize documentation seems to really be lacking and I can't figure this out.. they only give one example that I can find for following a link: response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) But I don't want to use a regex, I just want to follow a link based on its url, how would I do this.. also what is "nr" that is used sometimes for following links? Thanks for any info br.follow_link takes either a Link object or a keyword arg (such as nr=0 ). br.links() lists all the links. br.links(url_regex='...') lists all the