mechanize

Python browser emulator with JS support [closed]

醉酒当歌 提交于 2019-11-27 01:43:16
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I want to grab some data from a site. Usually I use mechanize for such things, but now the site gives the data with JS. Alas, mechanize doesn't support it. What can I use instead? 回答1: Here are some options: Selenium (tutorial) For headless automation, Selenium can be used in conjunction with PhantomJS WebKit

What should I do if socket.setdefaulttimeout() is not working?

大憨熊 提交于 2019-11-27 01:21:54
I'm writing a script(multi-threaded) to retrieve contents from a website, and the site's not very stable so every now and then there's hanging http request which cannot even be time-outed by socket.setdefaulttimeout() . Since I have no control over that website, the only thing I can do is to improve my codes but I'm running out of ideas right now. Sample codes: socket.setdefaulttimeout(150) MechBrowser = mechanize.Browser() Header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)'} Url = "http://example.com"

mechanize python click a button

做~自己de王妃 提交于 2019-11-26 23:11:14
I have a form with <input type="button" name="submit" /> button and would like to be able to click it. I have tried mech.form.click("submit") but that gives the following error: ControlNotFoundError: no control matching kind 'clickable', id 'submit' mech.submit() also doesn't work since its type is button and not submit. Any ideas? Thanks. clicking a type="button" in a pure html form does nothing. For it to do anything, there must be javascript involved. And mechanize doesn't run javascript . So your options are: Read the javascript yourself and simulate with mechanize what it would be doing

Mechanize and Javascript

断了今生、忘了曾经 提交于 2019-11-26 22:14:04
I want to use Mechanize to simulate browsing to a web page with active JavaScript, including DOM Events and AJAX, and so far I've found no way to do that. I looked at some Python client browsers that support JavaScript like Spynner and Zope, and none of them really work for me. Spynner crashes PyQt all the time, and Zope doesn't support JavaScript as it seems. Is there a way to simulate browsing with Python only (no extra processes) like WATIR or libraries that manipulate Firefox or Internet Explorer while supporting Javascript fully as if actually browsing the page? I've played with this new

Use mechanize to submit form without control name

梦想的初衷 提交于 2019-11-26 21:40:50
问题 I'm trying to use mechanize for python to submit a form but the form control I need to fill in doesnt have a name assigned to it. <POST https://sample.com/anExample multipart/form-data <HiddenControl(post_authenticity_token=) (readonly)> <HiddenControl(iframe_callback=) (readonly)> <TextareaControl(<None>=)>> The control I'm trying to edit is the last control in the above object, <TextareaControl(<None>=)> . I've looked at the documentation and cant seem to find a way to assign a value to

Installing mechanize for python 3.4

China☆狼群 提交于 2019-11-26 20:23:43
问题 I'm trying to retrieve the mechanize module for python 3.4. Can anybody guide me in the right direction and perhaps walk me through the steps that I would need to take in order to make the correct installation? I'm currently using Windows 10. 回答1: unfortunately mechanize only works with Python 2.4, Python 2.5, Python 2.6, and Python 2.7. The good news is there are other projects you can take a look at: RoboBrowser, MechanicalSoup There are more alternatives in this thread as well: Are there

adding directory to sys.path /PYTHONPATH

五迷三道 提交于 2019-11-26 19:40:49
I am trying to import a module from a particular directory. The problem is that if I use sys.path.append(mod_directory) to append the path and then open the python interpreter, the directory mod_directory gets added to the end of the list sys.path. If I export the PYTHONPATH variable before opening the python interpreter, the directory gets added to the start of the list. In the latter case I can import the module but in the former, I cannot. Can somebody explain why this is happening and give me a solution to add the mod_directory to the start, inside a python script ? Ned Deily This is

Is there a PHP equivalent of Perl&#39;s WWW::Mechanize?

痴心易碎 提交于 2019-11-26 12:22:24
I'm looking for a library that has functionality similar to Perl's WWW::Mechanize , but for PHP. Basically, it should allow me to submit HTTP GET and POST requests with a simple syntax, and then parse the resulting page and return in a simple format all forms and their fields, along with all links on the page. I know about CURL, but it's a little too barebones, and the syntax is pretty ugly (tons of curl_foo($curl_handle, ...) statements Clarification: I want something more high-level than the answers so far. For example, in Perl, you could do something like: # navigate to the main page $mech-

Using WWW:Mechanize to download a file to disk without loading it all in memory first

断了今生、忘了曾经 提交于 2019-11-26 11:18:37
问题 I\'m using Mechanize to facilitate the downloading of some files. At the moment my script uses the following line to actually download the files... agent.get(\'http://example.com/foo\').save_as \'a_file_name\' However this downloads the complete file into memory before dumping it to disk. How do you bypass this behavior, and simply download straight to disk? If I need to use something other than WWW:Mechanize then how would I go about using WWW:Mechanize\'s cookies with it? 回答1: What you

What should I do if socket.setdefaulttimeout() is not working?

你说的曾经没有我的故事 提交于 2019-11-26 09:38:03
问题 I\'m writing a script(multi-threaded) to retrieve contents from a website, and the site\'s not very stable so every now and then there\'s hanging http request which cannot even be time-outed by socket.setdefaulttimeout() . Since I have no control over that website, the only thing I can do is to improve my codes but I\'m running out of ideas right now. Sample codes: socket.setdefaulttimeout(150) MechBrowser = mechanize.Browser() Header = {\'User-Agent\': \'Mozilla/5.0 (Windows; U; Windows NT 5