mechanize

Python authenticate and launch private page using webbrowser, urllib and CookieJar

萝らか妹 提交于 2019-11-28 11:07:50
问题 I want to login with cookiejar and and launch not the login page but a page that can only be seen after authenticated. I know mechanize does that but besides not working for me now, I rather do this without it. Now I have, import urllib, urllib2, cookielib, webbrowser from cookielib import CookieJar username = 'my_username' password = 'my_password' url = 'my_login_page' cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) login_data = urllib.urlencode({'my

mechanize._mechanize.FormNotFoundError: no form matching name 'q'

喜夏-厌秋 提交于 2019-11-28 10:41:09
问题 Can anyone help me get this form selection correct? Trying to get a crawl of google, I get the error: mechanize._mechanize.FormNotFoundError: no form matching name 'q' Unusual, since I have seen several other tutorials using it, and: p.s. I don't plan to SLAM google with requests, just hope to use an automatic selector to take the effort out of finding academic citation pdfs from time to time. <f GET http://www.google.com.tw/search application/x-www-form-urlencoded <HiddenControl(ie=Big5)

How do I grab a thumbnail screenshot of many websites?

浪子不回头ぞ 提交于 2019-11-28 10:39:45
问题 I have a list of 2500 websites and need to grab a thumbnail screenshot of them. How do I do that? I could try to parse the sites either with Perl or Python, Mechanize would be a good thing. But I am not so experienced with Perl. 回答1: Here is Perl solution: use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); $mech->get('http://google.com'); my $png = $mech->content_as_png(); From the docs: Returns the given tab or the current page rendered as PNG image. All parameters are

Submitting Forms with Mechanize (Python)

自闭症网瘾萝莉.ら 提交于 2019-11-28 10:11:58
Well, I am trying to login to a site using Python and mechanize. I've got the site opened: site = br.open("http://example.com/login.php") And I've got a list of the forms (with br.forms). <GET http://example.com/search.php application/x-www-form-urlencoded <HiddenControl(search=1) (readonly)> ... <POST http://example.com/login.php application/x-www-form-urlencoded <TextControl(username=)> <PasswordControl(password=)> <CheckboxControl(stay=[1])> <SubmitControl(<None>=Log in) (readonly)>> I've been trying to submit the username and password fields. I tried doing it like this: br.select_form(nr=0

How to properly use mechanize to scrape AJAX sites

蓝咒 提交于 2019-11-28 09:08:59
问题 So I am fairly new to web scraping. There is this site that has a table on it, the values of the table are controlled by Javascript. The values will determine the address of future values that my browser is told to request from the Javascript. These new pages have JSON responses that the script updates the table with in my browser. So I wanted to build a class with a mechanize method that takes in an url and spits out the body response, the first time a HTML, afterwards, the body response

Is it possible to hook up a more robust HTML parser to Python mechanize?

本小妞迷上赌 提交于 2019-11-28 07:46:33
I am trying to parse and submit a form on a website using mechanize, but it appears that the built-in form parser cannot detect the form and its elements. I suspect that it is choking on poorly formed HTML, and I'd like to try pre-parsing it with a parser better designed to handle bad HTML (say lxml or BeautifulSoup) and then feeding the prettified, cleaned-up output to the form parser. I need mechanize not only for submitting the form but also for maintaining sessions (I'm working this form from within a login session.) I'm not sure how to go about doing this, if it is indeed possible.. I'm

Python browser emulator with JS support [closed]

房东的猫 提交于 2019-11-28 07:06:33
I want to grab some data from a site. Usually I use mechanize for such things, but now the site gives the data with JS. Alas, mechanize doesn't support it. What can I use instead? unutbu Here are some options: Selenium ( tutorial ) For headless automation, Selenium can be used in conjunction with PhantomJS WebKit Spidermonkey Here are some code examples: PyQt4 + WebKit An example using PyQt4+WebKit, and redone with Selenium 来源: https://stackoverflow.com/questions/21777306/python-browser-emulator-with-js-support

Python's mechanize proxy support

不羁的心 提交于 2019-11-28 06:36:51
I have a question about python mechanize's proxy support. I'm making some web client script, and I would like to insert proxy support function into my script. For example, if I have: params = urllib.urlencode({'id':id, 'passwd':pw}) rq = mechanize.Request('http://www.example.com', params) rs = mechanize.urlopen(rq) How can I add proxy support into my mechanize script? Whenever I open this www.example.com website, i would like it to go through the proxy. You use mechanize.Request.set_proxy(host, type) (at least as of 0.1.11) assuming an http proxy running at localhost:8888 req = mechanize

Python unable to retrieve form with urllib or mechanize

不羁岁月 提交于 2019-11-28 04:29:22
问题 I'm trying to fill out and submit a form using Python, but I'm not able to retrieve the resulting page. I've tried both mechanize and urllib/urllib2 methods to post the form, but both run into problems. The form I'm trying to retrieve is here: http://zrs.leidenuniv.nl/ul/start.php. The page is in Dutch, but this is irrelevant to my problem. It may be noteworthy that the form action redirects to http://zrs.leidenuniv.nl/ul/query.php. First of all, this is the urllib/urllib2 method I've tried:

Python mechanize, following link by url and what is the nr parameter?

独自空忆成欢 提交于 2019-11-28 03:24:31
I'm sorry to have to ask something like this but python's mechanize documentation seems to really be lacking and I can't figure this out.. they only give one example that I can find for following a link: response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) But I don't want to use a regex, I just want to follow a link based on its url, how would I do this.. also what is "nr" that is used sometimes for following links? Thanks for any info br.follow_link takes either a Link object or a keyword arg (such as nr=0 ). br.links() lists all the links. br.links(url_regex='...') lists all the